Re: [tlaplus] State-constraints and liveness

>> checking liveness properties in the presence of state constraints can be really tricky because the specification may not assert what you think it does. In particular, see section 14.3.5 of the TLA+ book.

Thanks for pointing this out. I believe that I've dodged the issue described in that section of the book, by ensuring that my fairness assumption cannot not be made false by the state-constraint. My fairness assumption is weak-fairness on just the 'internal' actions of the system -- i.e. the actions that drive any 'currently in-progress' operation forward to completion. But my state-constraint only constrains the number of 'calls' by a user of the system that may start new operations. The state-constraint does also constrain the number of actions by the 'physical environment', such as process failure.

But as this is a subtle issue, I could easily have made an error. I plan to test liveness checking as Leslie suggests, by ensuring that it can find violations of intentionally incorrect liveness properties. (See below for how doing so found an apparent problem).

>>I'd suggest that you rewrite your property so that TLC will accept it. For example, try
Spec /\ []<><TRUE>_vars => Liveness

[…]

I tried entering just the antecedent

Spec /\ []<><<TRUE>>_AllVars
... as the "Temporal formula" in the "What is the behavior spec?" section. i.e. Instead of just Spec.
And I entered the consequent
Liveness
... as the property to check.

Yes, that's exactly what I meant.

TLC accepts this. However, I first tested this configuration by checking several leads-to properties that I know to be invalid, and it failed to find any of them.

So either I'm using your suggestion incorrectly, or I'm still running into the kind of subtle interaction between liveness and state-constraints that you pointed out.

I'm probably asking the obvious, but does your constrained specification allow for any behaviors where variables keep changing eventually? E.g., if you have some counter that can only increase but for which the constraints impose an upper bound then at some point the model can no longer continue and has to stutter.

In such a case, augmenting the specification by

[]<><<TRUE>>_AllVars

effectively rules out all behaviors, so any property will hold vacuously. (TLC will still report failures of invariants since these are checked as it generates states.)

Perhaps a more appropriate specification of the fairness conditions would be something like

WF_AllVars(Constraints /\ InternalActions)

where InternalActions is the disjunction of all internal actions of your system and Constraints describes the state constraints. This condition requires that internal actions should eventually occur from states that satisfy your state constraints but requires nothing of states that do not satisfy the state constraints. Thus, the formula is not vacuously satisfied over the constrained runs, unlike

WF_AllVars(InternalActions)

which is false of the constrained runs in case the internal actions remain enabled in the states that do not satisfy the constraints.

(Of course, depending on your spec you may want to impose several constraints of the form WF_AllVars(Constraints /\ A) instead of a single constraint for the disjunction of the internal actions.)

Now, you may see counter-examples to your liveness property that are due to some operation that has been started but couldn't finish within the bounds of the constraints. In order to rule out these, you may have to weaken your liveness property to something like

Liveness == \A id \in Ids : MyProperty(id) ~> (\/ SomeOtherProperty(id)
\/ LegitimateTermination)

where LegitimateTermination could be just "~Constraints" or a suitable refinement that expresses that the handling of request "id" couldn't finish within the bounds. (Otherwise, if all your runs end up violating the constraints eventually, you haven't really verified anything.)

Again, understanding what is actually verified by the model checker in such situations can be really subtle.

At this point I decide to change my specification to remove the need for a state-constraint. I changed the definition of several sets from Nat to become CONSTANTS that are (small) finite sets, and added enabling conditions to the 'public API' and 'environment' actions to only enable them if they stay in these limits. This change isn't really polluting the conceptual purity of the specification, as in the real system these sets would be finite -- just very, very large. So this change can be interpreted as an assumption that the system will no longer be used if the sets are exhausted.

Alternatively, you could override Nat in the model that you pass to TLC by specifying something like

Nat <- 0 .. 9

in the "Definition Override" tab in the advanced options. (If you use any operations on naturals, you'll probably have to override these as well.) I agree, though, that the change that you made is nicer.

And I went back to using "Spec" as the 'Temporal formula' in the 'What is the behavior spec?' section.

Again I checked the several known-to-be-incorrect liveness properties.
This time I ran into a new TLC bug, which I think is real -- i.e. not just a user-error entering unacceptable temporal formulas into the wrong part of the toolbox.
The error occurred when TLC was checking liveness.
The exception was a java.io.IOException: Negative seek
offsetjava.io.IOException: Negative seek offset
    at java.io.RandomAccessFile.seek(Native Method)
    at tlc2.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:212)
    at tlc2.tool.liveness.DiskGraph.getNode(DiskGraph.java:145)
    at tlc2.tool.liveness.DiskGraph.getPath(DiskGraph.java:309)
    at tlc2.tool.liveness.LiveWorker.printTrace(LiveWorker.java:521)
    at tlc2.tool.liveness.LiveWorker.checkComponent(LiveWorker.java:290)
    at tlc2.tool.liveness.LiveWorker.checkSccs(LiveWorker.java:119)
    at tlc2.tool.liveness.LiveWorker.run(LiveWorker.java:613)

Ouch. Looks like TLC can't find its states anymore on the disk. I'm afraid I can't help you here.