thanks very much for sharing your specification. The conclusions that you draw from your experiment are very encouraging to us in the TLA+ community, and I wonder if you would be able to present your experience at the TLA+ community meeting later this year (http://tla2016.loria.fr)?
I played a bit with your spec, and here is some feedback. The modified spec is attached.
I understand that this issue is not present in the spec that you posted.
Indeed, that's the typical experience, and one invariably underestimates the number of reachable states. Nevertheless, as you noticed, small parameter values are often enough to find useful errors. The interesting question is how big parameters should be chosen in order to be convinced of correctness.
For many cache coherence protocols, it's enough to consider a single cache line (i.e., a single key in the terms of your spec). However, this is not true for your protocol because you consider sets of writes (through multi-CAS). It clearly makes a difference if all, no, or some updates succeed. However, it seems to me that two keys should be enough to see all relevant situations.
Similarly, two values are enough, since we are only interested in checking (dis)equality.
Finally, I believe that checking for two readers, two writers, and one operator is enough in order to detect all possible interferences.
Moreover, keys, values, readers, and writers can be chosen as symmetric sets of model for TLC (reducing the state space that TLC has to explore), since there is no reference to specific values of these sets – except for the initialization of the data base, which uses a CHOOSE _expression_. What I did there is to generalize the model by making the initialization non-deterministic, mapping keys to arbitrary values in the initial state: since all of these states are reachable anyway, they may as well be admitted as initial states.
With these parameter values fixed, it remains to choose a constraint on the version numbers that model checking should consider, expressed through a state constraint. Choosing a bound of 2, TLC explores the state space within approximately 45 minutes on my laptop. It takes about a day for a bound of 3.
These results made me quite confident that the protocol actually satisfies the stated correctness properties. If you want to go beyond fixed instances, you may turn to theorem proving. The main complication is that this requires an inductive invariant (a predicate that is true initially and preserved by every possible transition from any state satisfying the invariant), and in general it may be very complicated to come up with such an invariant.
In your case, the invariant seemed quite obvious to me, so I wrote it down and proved it using TLAPS. The proof is contained in the module ConsistentKV_proofs, in case you are interested. You may find it enlightening to read the invariant because it gives a clear explanation why the protocol works, in contrast to model checking, which "only" assures you that it does. In particular, the invariant may guide modifications and optimizations of the algorithm as it stands. The proof was a breeze because TLAPS essentially did it automatically. The only interaction was to break down the proof into individual steps per instruction (corresponding to the disjuncts of the next-state relation). For a few steps, I helped the prover by separately proving each conjunct in the conclusion, although for all but one it would actually have been enough to raise the timeout a little before the prover gives up ("SMTT(30)" stands for a timeout of 30 seconds to the relevant backend). For some reason, the auxiliary proof of the (full) typing invariant needed a little more interaction although it appears to be even more straightforward. I didn't bother prove the property VersionsOK, which looks obvious.
Thanks again for the interesting experience!
Description: Binary data
Description: Binary data