Re: [tlaplus] Canonical TLA+ ... how to avoid "imperative brainwash" on KV store

On Wednesday, November 25, 2020 at 1:23:41 PM UTC-5 Stephan Merz wrote:

TLC explores all successors of every state it analyzes. The algorithm for checking invariants can be given as follows:

queue = [s : s is an initial state of the spec] \* queue of states to explore
visited = [] \* set of states already visited
while queue != [] :
s = head(queue)
queue = tail(queue)
if ~(s in visited) :
visited += [s]
if ~(s satisfies invariant) :
return false \* compute and display counter-example
succs = [t : t is a successor of s w.r.t. the next-state relation]
queue += succs
return true

You can visualize the state graph as a .dot file, this works for small state spaces (it might still be OK for your example). In any case, it is a good idea to check properties that you expect to be false and inspect the counter-example to check that it corresponds to the computation that you expect. For example, you could check properties such as

hash[10] ~= 101 (invariant stating that one can not assign value 101 to key 10)

[][hash[10] = 101 => hash'[10] = 101]_vars (temporal property asserting that once key 10 has that value, it will never be assigned a different value)

etc. I am sure that more could be done for visualizing and exploring state graphs.

Stephan

On 25 Nov 2020, at 18:53, recipient <nets...@xxxxxxxxx> wrote:

Stephen, again thank you for taking time to engage questions.

Continuing the sql analogy where one describes what is to be done ... the sql developer always has the option of running EXPLAIN throwing open the curtain to reveal how things work.

I'm finding I need a TLA explain so I can assuredly resolve my anxiety that no errors is really no errors. Otherwise it's black magic. Maybe I need to read up on the tla model checking algo. This would help programmers like me who are too versed in imperative orientation rather then specification orientation.

In the running simplified example the initial state is an empty hash map to which the the following client operations are permitted:

Step 0 = initialize has empty
Step 1 = Op get key 10
Step 2 = Op put key 10 Val 100
Step 3 = Op put key 10 Val 101

So tla starts in the initial state, empty hash. The state paths available to it depending on fairness are:

(0) stop
(0, 1) stop \* 0 \X perms of {1, 2, 3} take 1
(0, 2) stop
(0, 3) stop
+ 0 \X perms of {1, 2, 3} take 2
+ 0 \X permutations of {1, 2, 3} take 3

Here's the critical question: if tla starts in step 0 then transitions to step 1 ... Then so far so good.

It sees step 2 is enabled.

But if it determines step 0 was already visited and simply transitions into step 2 making path (0, 1, 2) --- ultimately one of the paths we want --- it'll never explore (0, 2) and by extension it'll never try any later path starting with 0 and therefore NOT explore all possible paths. The blanket statement then it tries only previously unvisited but enabled steps is incomplete.

So what does tla do? And what tools are available to have tla explain what it does? I find myself in need of understanding this aspect of tla behavior to assure myself it's doing what I think it ought to do.

Regards

On Wed, Nov 25, 2020, 2:42 AM Stephan Merz <stepha...@xxxxxxxxx> wrote:
Think of it the other way round: the model that you described in your previous message has two keys and two values (plus a null value when a key is not assigned), so your store has 9 possible values. I'm assuming that the operation that was performed last is not stored in the state (so it doesn't contribute to the number of states) but you may have one or two more variables, for exemple for displaying the result of an operation or for storing the point of control, and these contribute more states. Let's assume that the result variable has 4 possible values (the two values in your store, a "success" for a PUT and an "error" for a GET or DELETE on an unassigned key), then we get 9*4 = 36 possible states. Your spec may be different, but a state graph of 178 distinct states does not sound wrong to me.

TLC cannot and does not generate executions one by one: as you say, every execution is of infinite size because of the infinite loop, and there are infinitely many of them because operation calls can be interleaved in arbitrary orders. Instead, it computes the graph of reachable states and explores successors only once per state. As you say there are 12 possible successors for each state, but many of them will loop back to states that TLC has already seen before (e.g., a GET or a DELETE for an unassigned key don't change the state of the store, two PUT operations for two different keys executed in one order or the other lead to the same state etc). So there is no need to actually explore all permutations of all subsets of operations, the only thing that matters is that no reachable state is missed.

Regards,
Stephan

On 25 Nov 2020, at 00:54, recepient recepient <nets...@xxxxxxxxx> wrote:

Stephan, Thank you - know how on display.

How does TLA ever escape from 'while (TRUE)'? It does in fact. But any reasonable interpretation would argue TRUE is a constant true condition.

Second, you write: "TLC will explore all possible sequences of operation calls...": Are we really sure? Really? If yes, how ... because it's impressive if true.

The cartesian product k \in HashKey, v \in HashValue, op \in ClientOps is 12 elements big so 2^12 elements in its power set. And for each element in the power set it would have to try all permutations contributing n! calls per each. Altogether that's over 1 billion calls into PerformOp. On the other hand TLA reports:

>278 states generated, 178 distinct states found, 0 states left on queue.

Adding a print in PerformOp (I know, I know!) only shows ~100 calls. I'm only seeing <200 nodes and <300 edges in the .dot graph albeit there are cycles all over the place. Even then I can only find ~100 unique paths from the initial node to final node without revisiting nodes.

But there's even more. If this is to be done right TLA will have to reset the hash to its initial state at the right to time.

Example, suppose we only cared about exploring all possible sequences of operation calls on just two tuples. I'd expect TLA to do the following.

Case 1:
altHash = [ altHashKey \in HashKey |-> Nil ],
call PerformOP( tuple1 );

Case 2:
altHash = [ altHashKey \in HashKey |-> Nil ],
call PerformOP( tuple2 );

Case 3:
altHash = [ altHashKey \in HashKey |-> Nil ],
call PerformOP( tuple1 );
call PerformOP( tuple2 );

Case 4:
altHash = [ altHashKey \in HashKey |-> Nil ],
call PerformOP( tuple2 );
call PerformOP( tuple1 );

Again, if this fragment:

while (TRUE) {
with k \in HashKey, v \in HashValue, op \in ClientOps {
call PerformOp(k,v,op)
}
}

is really running this kind of state exploration ... mark me down as extremely impressed
On Tuesday, November 24, 2020 at 2:43:10 AM UTC-5 Stephan Merz wrote:
Why don't you just write a loop of the form

while (TRUE) {
with k \in HashKey, v \in HashValue, op \in ClientOps {
call PerformOp(k,v,op)
}
}

TLC will explore all possible sequences of operation calls, for all possible combinations of arguments. Whenever it runs into a state that it has seen before (because the store has the same value for every key), it will stop exploring that branch. The power of model checking is that it lets you explore even infinite executions over finite state spaces. There's no need to materialize the search that TLC will perform in the data structures of your specification.

Regards,
Stephan

On 23 Nov 2020, at 18:31, recepient recepient <nets...@xxxxxxxxx> wrote:

Hi -

I'm working on a distributed KV store in Pluscal. Using TLA+ our desire is to ``SQLtise" our Pluscal code everywhere possible: describe what is to be done & checked without proscribing the steps ala imperative code to TLA+. Granted, we can fly closer to that mentality by avoiding Pluscal altogether ... but that's out of scope for the moment.

I am stuck on one basic aspect of this work which seems to exist right at the boundary between allowing TLA+ to explore reachable states, and me telling it what I think reachable states are...

Assume:
key \in {0,1}, value \in {100,101},   {OP_GET, OP_PUT, OP_DELETE}  for {0,1} \X {100,101} \X {OP_GET_PUT,OP_DELETE} or 12 unique possible client requests. (Note: get/delete operations never use the value part).

Goal:
Have TLA+ check correctness properties (not shown) by exploring all possible ways a client can send KV requests. Imperatively all possible ways are given by:

For each subset s in  {0,1} \X {100,101} \X {OP_GET_PUT,OP_DELETE}:
For each permutation p in s:
(re)initialize model
model check operations in p by running p[i] where i \in 1..Len(p)

Candidate Process 1 (elided): BAD: Doesn't do all permutations or subsets
with v \in HashKey, v \in HashValue, op \in ClientOps
do
call PerformOp(k,v.op);
end with;
Enabling DOT visualization & writing some python to print all paths from initial to final states using the DOT data, we can show that TLA+ tries the first rpc tuple, then tries rpc tuples 1 and 2, then 1 and 2 and 3 ... and on up to 1..12. But that's it!

Candidate Process 2 (elided): BAD: Doesn't do permutations or subsets:
  \* Alternative formulation of process 1. Choose approach much the same ...
with i \in HashKey \X HashValue \X ClientOps
do
call PerformOp(i[1],i[2],i[3]);
end with;

At this point TLA+ can't read our minds: it needs to be told to try all permutations of all subsets. But if one writes writes something like the following, JAVA runs out of stack:

Candidate Process 3 (elided): BAD: Won't work; runs out of stack
variables rpcs=Permutations(SUBSET(HashKey, v \in HashValue, op \in ClientOps))

with i \in rpcs
do
// now try to dig out individual RPCs and run them...
call PerformOp(....);
end with;

Moreover, even if this compiled and ran it's not clear if this achieves the goal. Did it check all permutations on subsets of size 5 starting from the initial state or did it run that on top of permutations on subsets size 4?

What does one do here?

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/e3a99dad-9df9-48a9-b0ad-c9ba95a25d7dn%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/ce825f7b-4614-4031-8395-91765f479fe9n%40googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "tlaplus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tlaplus/XakPxMJczb8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/E92DE534-170B-4D79-B587-6A21FD6D11C5%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/CAPz6axO%2BX6O83z6d6Nj9JZ%2BO-_6eKzugEa%2BcyxGihHQHrrqHZw%40mail.gmail.com.