Re: Is it possible to TLA+ (and PlusCal) to check complex models?

Hi Hillel,

Thank you for your reply and thanks for catching the unnecessary label!

Do you mean if I check for liveness, at most only one core will be utilized? So we should not check any properties in the "model overview" page to utilize more than 1 core? This is certainly new knowledge to me!

On Friday, February 24, 2017 at 5:19:42 PM UTC-5, Hillel Wayne wrote:

Hi Shuai,

IIRC liveness checking only uses a single core, because it has to track behaviors. Testing safety conditions is generally faster.

Given N processes with M skip; labels and 1 starting state, you'll have approx (NM)!/(M!)^N possible behaviors. TLA+ will attempt to exhaustively check all of them. Combined with multiple starting states and with statements you hit combinatorial explosion pretty quickly. Usually the simplest optimizations are finding ways to reduce the number of labels without compromising the integrity of your spec. Here's one example:

procedure AggregateGraphForFastPath(pre_accept_acks, partitions) { xxxxdddx: tmp_partitions := partitions; yyyyassy: while (tmp_partitions /= {}) { with (p \in tmp_partitions) { tmp_par := p; tmp_partitions := @ \ {p}; }; tmp_ack := PickXXX(pre_accept_acks[tmp_par]); dep_c := @ \union tmp_ack.dep; }; return; }

Since procedures automatically reset local variables arguments after use, you don't need to use tmp_partitions, which means you don't need the xxxxddxx label.

On Thursday, 23 February 2017 15:50:32 UTC-6, Shuai Mu wrote:
Hi,

I am trying to use TLA+ to reason a fair complex protocol
(https://github.com/NYU-NEWS/janus/blob/master/tla/Janus.tla)
used for transaction and replication. An comparable system
would be Google's Spanner, i.e. 2PC on top of Paxos. The
difference is my protocol is not layered like 2PC and Paxos
so everything is described as one.

I found that it is hard to do model checking with complex models.
It is okay to check with simpler models, e.g. 1 shard + 1 replication,
3 shard + 1 replication, 1 shard+3 replication. The model checker
can finish in a few minutes on my 24-core workstation. However,
if I change to a more complex model, such as 3 shard + 3 replication.
The checker seem never finishes. The longest run I had is a little
less than 2 days with my 24-core workstation.

I realize it might has something to do with how the model checker
does (brutal search?), and perhaps the labels in the PlusCal kind
of makes things worse? Does adding a label in PlusCal severely
affect the time for model checking?

Is there any way to work around this?

-Shuai