Hi,
I am trying to use TLA+ to reason a fair complex protocol
(https://github.com/NYU-NEWS/janus/blob/master/tla/Janus.tla)
used for transaction and replication. An comparable system
would be Google's Spanner, i.e. 2PC on top of Paxos. The
difference is my protocol is not layered like 2PC and Paxos
so everything is described as one.
I found that it is hard to do model checking with complex models.
It is okay to check with simpler models, e.g. 1 shard + 1 replication,
3 shard + 1 replication, 1 shard+3 replication. The model checker
can finish in a few minutes on my 24-core workstation. However,
if I change to a more complex model, such as 3 shard + 3 replication.
The checker seem never finishes. The longest run I had is a little
less than 2 days with my 24-core workstation.
I realize it might has something to do with how the model checker
does (brutal search?), and perhaps the labels in the PlusCal kind
of makes things worse? Does adding a label in PlusCal severely
affect the time for model checking?
Is there any way to work around this?
-Shuai