Re: [tlaplus] Experimenting with PlusCal / TLA+ at Dropbox

Hi Elliott,

thanks very much for sharing your specification. The conclusions that you draw from your experiment are very encouraging to us in the TLA+ community, and I wonder if you would be able to present your experience at the TLA+ community meeting later this year (http://tla2016.loria.fr)?

I played a bit with your spec, and here is some feedback. The modified spec is attached.

I changed the reader and writer processes so that they are cyclic, modeling repeated invocations of read and write operations, rather than your terminating (one-shot) processes. This of course increases the number of states that the model checker has to consider, but it may make the result of verification a little more convincing, and the model checker will stop exploring when it hits the same state for the second time.
In order to reduce the number of reachable states that differ in irrelevant ways, I initialize the local variables to "type-correct" values and reinitialize them after every iteration. Although this change could in general mask some problems, it is quite obvious that your algorithm never reads an uninitialized variable, so this does not change the visible behavior.
It is important to have as large atomic steps as possible because interleaving is the main culprit of state space explosion. I therefore removed your "start_pending" operation, which is a no-op, and combined "get_sentinels" and "check_already_pending", since the latter only reads local variables, so at the level of observable (global) variables no difference of behavior can be observed. Similar for "cas_sentinels" and "check_cas_errors". (However, I added another step for reinitializing the local variables of the writers because a write operation may terminate prematurely if it detects a concurrent pending write.)
The model of the "operators" is clearly incomplete since the spec only models startup (integration of readers and writers) but does not model failure. However, I essentially left this as it was – it's quite obvious that the "operator" processes do not play an essential role for the stated correctness properties for this spec.
I combined your two variables cache_keys_to_values and cache_keys_to_versions in a single variable cache that stores an array of records. This is a purely cosmetic change that doesn't change the size of the state space, but I find it a bit easier to read.

The model checker did find an issue, but it was already a known and tracked issue (something can go wrong if a writer takes a really long time to commit a database transaction while the cache is being restarted).

I understand that this issue is not present in the spec that you posted.

I had to keep the model's parameters really small, or face a tremendous combinatorial explosion of states (though I'm not ruling out the possibility that I could've formulated the spec in a different way that's more conducive to model checking).

Indeed, that's the typical experience, and one invariably underestimates the number of reachable states. Nevertheless, as you noticed, small parameter values are often enough to find useful errors. The interesting question is how big parameters should be chosen in order to be convinced of correctness.

For many cache coherence protocols, it's enough to consider a single cache line (i.e., a single key in the terms of your spec). However, this is not true for your protocol because you consider sets of writes (through multi-CAS). It clearly makes a difference if all, no, or some updates succeed. However, it seems to me that two keys should be enough to see all relevant situations.

Similarly, two values are enough, since we are only interested in checking (dis)equality.

Finally, I believe that checking for two readers, two writers, and one operator is enough in order to detect all possible interferences.

Moreover, keys, values, readers, and writers can be chosen as symmetric sets of model for TLC (reducing the state space that TLC has to explore), since there is no reference to specific values of these sets – except for the initialization of the data base, which uses a CHOOSE _expression_. What I did there is to generalize the model by making the initialization non-deterministic, mapping keys to arbitrary values in the initial state: since all of these states are reachable anyway, they may as well be admitted as initial states.

With these parameter values fixed, it remains to choose a constraint on the version numbers that model checking should consider, expressed through a state constraint. Choosing a bound of 2, TLC explores the state space within approximately 45 minutes on my laptop. It takes about a day for a bound of 3.

These results made me quite confident that the protocol actually satisfies the stated correctness properties. If you want to go beyond fixed instances, you may turn to theorem proving. The main complication is that this requires an inductive invariant (a predicate that is true initially and preserved by every possible transition from any state satisfying the invariant), and in general it may be very complicated to come up with such an invariant.

In your case, the invariant seemed quite obvious to me, so I wrote it down and proved it using TLAPS. The proof is contained in the module ConsistentKV_proofs, in case you are interested. You may find it enlightening to read the invariant because it gives a clear explanation why the protocol works, in contrast to model checking, which "only" assures you that it does. In particular, the invariant may guide modifications and optimizations of the algorithm as it stands. The proof was a breeze because TLAPS essentially did it automatically. The only interaction was to break down the proof into individual steps per instruction (corresponding to the disjuncts of the next-state relation). For a few steps, I helped the prover by separately proving each conjunct in the conclusion, although for all but one it would actually have been enough to raise the timeout a little before the prover gives up ("SMTT(30)" stands for a timeout of 30 seconds to the relevant backend). For some reason, the auxiliary proof of the (full) typing invariant needed a little more interaction although it appears to be even more straightforward. I didn't bother prove the property VersionsOK, which looks obvious.

Thanks again for the interesting experience!

Stephan