Re: [tlaplus] Is it possible to TLA+ (and PlusCal) to check complex models?

Hi Shuai,

I had a quick look at your spec, and I am not surprised that TLC blows up on it. Yes, it does "brutal search" of the state space, enumerating the reachable states, creating successor states by trying to evaluate all actions on every state it has seen so far. What you have seen is the well-known problem of combinatorial explosion (i.e., exponential growth of the state space) as you add more processes. PlusCal labels define the atomic actions of your spec: the more you have, the more fine-grained your transitions are and the more states you'll generate. On the other hand, inserting fewer labels may mask problems due to concurrent execution of actions that access shared variables.

Fortunately, exhaustive search tends to find errors in very small configurations because TLC explores states that are extremely unlikely to occur in actual executions. You should therefore ask yourself what are the smallest meaningful configurations that give you confidence in the correctness of the overall system.

Also, can you tear apart your algorithm and verify pieces independently of the rest? (Your description seems to say that this is difficult.)

Finally, it looks like you "only" verify the liveness property <>SrzAcyclic(srz). Liveness checking takes more time than invariant checking, and I'd try to think of invariant properties (describing relations among variables that you expect to hold) and have TLC check those as well. If exhaustive search is taking too long, you may consider doing random exploration (see Advanced Options -> TLC Options -> Simulation mode) which may find deeper invariant violations more quickly than standard breadth-first exploration.

Regards,

Stephan

On 23 Feb 2017, at 22:50, Shuai Mu <msm...@xxxxxxxxx> wrote:

Hi,

I am trying to use TLA+ to reason a fair complex protocol
(https://github.com/NYU-NEWS/janus/blob/master/tla/Janus.tla)
used for transaction and replication. An comparable system
would be Google's Spanner, i.e. 2PC on top of Paxos. The
difference is my protocol is not layered like 2PC and Paxos
so everything is described as one.

I found that it is hard to do model checking with complex models.
It is okay to check with simpler models, e.g. 1 shard + 1 replication,
3 shard + 1 replication, 1 shard+3 replication. The model checker
can finish in a few minutes on my 24-core workstation. However,
if I change to a more complex model, such as 3 shard + 3 replication.
The checker seem never finishes. The longest run I had is a little
less than 2 days with my 24-core workstation.

I realize it might has something to do with how the model checker
does (brutal search?), and perhaps the labels in the PlusCal kind
of makes things worse? Does adding a label in PlusCal severely
affect the time for model checking?

Is there any way to work around this?

-Shuai

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To post to this group, send email to tla...@xxxxxxxxxxxxxxxx.
Visit this group at https://groups.google.com/group/tlaplus.
For more options, visit https://groups.google.com/d/optout.