[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is it possible to TLA+ (and PlusCal) to check complex models?



Hi Shuai,

IIRC liveness checking only uses a single core, because it has to track behaviors. Testing safety conditions is generally faster.

Given N processes with M skip; labels and 1 starting state, you'll have approx (NM)!/(M!)^N possible behaviors. TLA+ will attempt to exhaustively check all of them. Combined with multiple starting states and with statements you hit combinatorial explosion pretty quickly. Usually the simplest optimizations are finding ways to reduce the number of labels without compromising the integrity of your spec. Here's one example:

procedure AggregateGraphForFastPath(pre_accept_acks, partitions) {

   xxxxdddx:

     tmp_partitions := partitions;  

   yyyyassy: while (tmp_partitions /= {}) {

     with (p \in tmp_partitions) {

       tmp_par := p;

       tmp_partitions := @ \ {p};

     };

     tmp_ack := PickXXX(pre_accept_acks[tmp_par]);

     dep_c := @ \union tmp_ack.dep;

   };

   return;

 }      


Since procedures automatically reset local variables arguments after use, you don't need to use tmp_partitions, which means you don't need the xxxxddxx label.

On Thursday, 23 February 2017 15:50:32 UTC-6, Shuai Mu wrote:
Hi, 

I am trying to use TLA+ to reason a fair complex protocol 
(https://github.com/NYU-NEWS/janus/blob/master/tla/Janus.tla)
used for transaction and replication. An comparable system 
would be Google's Spanner, i.e. 2PC on top of Paxos. The 
difference is my protocol is not layered like 2PC and Paxos 
so everything is described as one. 

I found that it is hard to do model checking with complex models. 
It is okay to check with simpler models, e.g. 1 shard + 1 replication, 
3 shard + 1 replication, 1 shard+3 replication. The model checker 
can finish in a few minutes on my 24-core workstation. However, 
if I change to a more complex model, such as 3 shard + 3 replication. 
The checker seem never finishes. The longest run I had is a little 
less than 2 days with my 24-core workstation. 

I realize it might has something to do with how the model checker 
does (brutal search?), and perhaps the labels in the PlusCal kind 
of makes things worse? Does adding a label in PlusCal severely 
affect the time for model checking? 

Is there any way to work around this?

-Shuai