[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Model network using set or bag

From: Markus Kuppe <tlaplus-google-group@xxxxxxxxxxx>
Date: Wed, 18 Mar 2026 09:30:41 -0700
References: <fa267fbb-f81c-4739-9801-5c5f9852da41n@googlegroups.com> <CABj=xUU_gJb86jUazO5PKQXZDFuqa8HxtnKNePrUbh8QE5cs6Q@mail.gmail.com> <E50C9B0D-FDB4-4F70-9BC8-672372C9C055@gmail.com>

Building on Stephan's point that there isn't a single “best" way: in many formalisms, this creates a real modeling dilemma, since you have to pick the "right" level of detail up front. In TLA, refinement lets us turn that into a modeling strategy instead.

In particular, approaches like (1), where processes directly read each other's state, are sometimes dismissed as "not really modeling communication". But in TLA, that's actually useful. It lets us start with a simple, high-level spec that captures the essential safety and liveness properties, without committing early to a specific communication mechanism.

From there, we can introduce lower-level specs using any of the more detailed variants you list (queues, sets, lossy networks, etc.), and then formally show that they refine the more abstract specs. In other words, we can show that the more concrete model is a correct implementation of the high-level one.

This is where TLA+ distinguishes itself: the ability to formally connect different points in the abstraction/precision tradeoff, instead of picking one and hoping it was the right choice. A typical flow is:

* start with a simple, readable model (often close to (1)),
* validate the core properties,
* and then refine toward more concrete communication models like (4) or your favored variant.

So you end up not just with multiple models, but with a clear relationship between them, avoiding both a single model at the wrong granularity and a set of disconnected ones.

This kind of abstraction is also how some algorithms were found in the first place. As Lamport himself writes about Paxos, "It would have been very hard to find this algorithm had my mind been distracted by the irrelevant details introduced by having the processes communicate by messages." [1,2]

Suppressing communication details isn't just about keeping the state space small. It can make the problem easier to think about. Refinement then gives you a way to add the details back in a controlled way.

M.

[1] https://github.com/tlaplus/Examples/blob/master/specifications/PaxosHowToWinATuringAward/Voting.tla#L8-L10
[2] https://lamport.azurewebsites.net/tla/paxos-algorithm.html

> On Mar 18, 2026, at 8:54 AM, Stephan Merz <stephan.merz@xxxxxxxxx> wrote:
> 
> Thanks Andrew, great description. In summary, there is not a single "best" way to model communication, one has to balance abstraction (favoring smaller state spaces) and precision. At the extreme end, (1) doesn't really model communication at all but can lead to very readable high-level specifications that can then be refined based on some of the higher-numbered variants.
> 
> A small niggle: I don't really understand what "artificial deadlocks" you refer to for (3): as long as message queues are unbounded, why would sending lead to deadlock?
> 
>> On the PlusCal side there is also Distributed PlusCal [PDF] which as I understand it adds message passing primitives to the language
> 
> 
> It's essentially based on your (3) or (4), depending on how processes use channels, with variants for FIFO or unordered message delivery (mapping to sequences respectively bags in TLA+).
> 
> Stephan
> 
>> On 18 Mar 2026, at 16:27, Andrew Helwer <andrew.helwer@xxxxxxxxx> wrote:
>> 
>> This is a good question and I've been meaning to write a blog post summarizing & comparing all these for basically forever. Off the top of my head, in order of roughly increasing fidelity we have:
>>     • 
>> Processes directly read each others' state; Lamport uses this approach in many of his specs IIRC. I suspect it is more effective at modeling distributed algorithms than we think!
>>     • To send, messages are put in a single buffer value (either on sender, recipient, or shared); if the buffer is currently occupied, messages cannot yet be sent. Recipients pull messages off the buffer value and clear it. More common for modeling communication over a bus within the same computer.
>>     • Each process has a message receive queue or (if out-of-order messages are to be modeled) a message receive unordered set, into which messages are added and then removed.
>>     • A single queue or unordered set for the entire network, into which messages are added and removed; this usually requires wrapping each message in a header identifying the sender & recipient. Strictly superior to 3 in my opinion because there are some weird artificial deadlocks that are avoided.
>>     • A single bag for the entire network, as detailed in your post; this has the drawback of unbounded state, as each counter can be incremented to infinity.
>>     • (My favored option) Basically the same as 4 using a single unordered set for the entire network, but sends can nondeterministically fail to add a message to the set (representing message loss) and receives can nondeterministically fail to remove a message from the set (representing message duplication). The unordered set inherently gives you out-of-order messaging. You can also tune these constraints to make your model larger or smaller using the exact same API for both sends & receives, including using a queue for the network instead of a set. I have a nice implementation of this I've been meaning to add to the community modules for a while.
>> There are probably others but those are the ones I've seen. On the PlusCal side there is also Distributed PlusCal [PDF] which as I understand it adds message passing primitives to the language. It is being worked on by Stephan's group.
>> 
>> Andrew Helwer
>> 
>> On Wed, Mar 18, 2026 at 4:21 AM Alberto Carretero <angelalbertoc.r@xxxxxxxxx> wrote:
>> I see several discussions here on the topic of message passing or atomic communication but I have not found any on which representation is best for message passing and I wonder if there is a standard way, or at least, one that is better for TLC.
>> 
>> In general I see that there are two different possibilities. There are several specs like PaxosMadeSimple[1] that use a set of all the messages ever sent. Duplicates do not need to be modeled specifically as messages are never deleted and message loss is equivalent to never receiving the message. There is even a comment written presumably by Lamport in the spec that states:
>> 
>> > (* The algorithm is easiest to understand in terms of the set msgs of all  *)
>>   (* messages that have ever been sent.  A more accurate model would use one *)
>>   (* or more variables to represent the messages actually in transit, and it *)
>>   (* would include actions representing message loss and duplication as well *)
>>   (* as message receipt.                                                     *)
>> 
>> I read "more accurate" as closer to a real implementation but not necessarily more expressive as I think that all the safety properties can also be modeled using a set. Is that right?
>> 
>> Now, on the other hand we have specs such as Raft[2][3] that use a bag of messages to model the network (correspondence between msg and count). Duplicates are handled by incrementing the count and drops by decrementing it. To me, this representation should be much less performant when model checking in TLC as each duplicate / drop changes the state and can be interleaved with any other action. Even if the network is constrained to disallow duplicates, more states will be generated compared to the set as messages can be received by the destination and then resent by the sender; effectively duplicating the number of generated states.
>> 
>> At this point, it looks to me like using a set would be preferred in general, but then I see most specs preferring a bag. What am I missing? Are there any other advantages? Or maybe sets are more efficient but the effect is not meaningful enough.
>> 
>> [1]: https://github.com/tlaplus/tlaplus/blob/master/general/performance/PaxosMadeSimple/PaxosMadeSimple.tla
>> [2]: https://github.com/ongardie/raft.tla
>> [3]: https://github.com/etcd-io/raft/blob/main/tla/etcdraft.tla

-- 
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/E1725A38-1F7B-4EBA-BD70-B4703FD68565%40lemmster.de.

References:
- [tlaplus] Model network using set or bag
  - From: Alberto Carretero
- Re: [tlaplus] Model network using set or bag
  - From: Andrew Helwer
- Re: [tlaplus] Model network using set or bag
  - From: Stephan Merz

Prev by Date: Re: [tlaplus] Model network using set or bag
Next by Date: Re: [tlaplus] Model network using set or bag
Previous by thread: Re: [tlaplus] Model network using set or bag
Next by thread: Re: [tlaplus] Model network using set or bag
Index(es):
- Date
- Thread