[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Send/receive message passing or atomic communication in distributed systems

Thanks for the responses, this has all been very helpful (with lots of reading material).

Regarding the logless reconfig spec, I can understand those arguments well, especially considering that the Raft algorithm is so well understood, with preexisting message passing specs and a proof.

In general, my recent work on Kafka protocols (pull-based Raft for the controller, Kafka replication protocol, new replication protocol in Kora and the new consumer group protocol), all are asynchronous message passing, all have state-spaces so large that brute-force model checking is generally unviable and so I have been using a lot of simulation. The first three specs have very large state-spaces because I have modeled most of the protocol, for example, implementing leader election, replication, reconfiguration and pre-vote for the pull-based Raft. The consumer group protocol has a very large state-space simply because it requires a large number of messages to be exchanged to reach convergence. You could say that I've gone the "faithful" route, which has consistently been requested by the engineers I work with. Despite depending on simulation, I have been able to find numerous design flaws, some in the pre-existing protocol designs that have been out there for a number of years. I feel that simulation has worked very well at undercovering issues.

Having said that, I am in the process of evaluating my approach taken for recent work. With the rise of deterministic simulation testing, I am considering whether I should step back from the lower level faithful approach and use TLA+ at a higher abstraction level, then rely on simulation testing to catch bugs (in the implementation) that could be missed by higher level TLA+ specs. In this approach, the TLA+ adds value by helping me reason about the problem and validate the general approach, while the simulation testing catches the low level gotchas.

Any further comments would be welcome.


On Mon, Feb 26, 2024 at 1:30 AM Aman Shaikh <amanshaikh75@xxxxxxxxx> wrote:
Hi Jack

I have written (or in the process of writing) TLA+ specs of three distributed systems. For each of these systems, I specify the system as a set of nodes and channels between the nodes. Each channel is point-to-point as it facilitates communication between a pair of nodes, and hence consists of two queues for each direction of the communication. A TLA+ action (which is essentially an 'atomic action') usually consists of a node "processing" a pending message at one of its channels, updating its own state as a result of the message processing, and enqueuing resulting messages for other nodes via appropriate channels. I then have the 'next' state in the TLA+ spec randomly pick a node that has at least one message to process. For my purpose, the fact that messages are sent point-to-point and because my focus is on what happens when nodes process messages, this way of specifying sending and receiving of messages seems sufficient. That said, I can refine my specs to deal with idiosyncrasies of the actual message transmission. For example, in one of the systems, the channel is a TCP connection which is a fairly involved protocol, but for my purposes, it is enough to assume that the TCP channel provides a loss-free, in-order message delivery.

Overall, I feel that you can write your TLA+ spec to capture as much detail of your (distributed) system as you want, but the more fine-grained your spec is, the longer it's likely to become, and more states the spec will have to grapple with. The latter is of great practical significance in my experience due to the (possibility of) state-space explosion that occurs when you model-check your spec with TLC.


On Saturday, February 24, 2024 at 10:43:53 PM UTC-5 Willy Schultz wrote:

The abstraction level of those specs was influenced both by our intuition and by pragmatic concerns. Overall, I would say that our work was originally driven by a desire to come up with the most abstract model of the underlying algorithm that was also "useful" from an engineering standpoint. In our context, that meant having a model that gave us a good understanding of the abstract behavior of the algorithm, and one for which model checking was mostly feasible for small/medium sized protocol instances, and also efficient enough to provide prompt feedback while iterating on candidate designs.

More concretely, I think our approach was also influenced by the fact that many of the bug scenarios described in the original Raft reconfig algorithm [6] can essentially be represented even in this abstract model. So, we felt it was a good starting place to express our protocol at this abstraction level so that we could clearly understand these types of bugs at a high level, and work towards a design that would avoid similar issues in MongoDB's system/protocol.

As a general design methodology, I think this is a relatively effective/efficient approach, especially when you are using a spec to iterate on the design of a new protocol (as was the case in our situation). It is roughly analogous to a kind of refinement driven approach i.e. designing a protocol at increasingly lower levels of abstraction until you get what you need. In our case, we didn't really do any explicit refinement steps, but I think it was still useful to execute most of our design work at a "well chosen" level of the abstraction hierarchy first, before moving to levels that are more complex and/or may make model checking infeasible.

As an additional side note, formalizing the intuition around how to reduce asynchronous message protocols to equivalent synchronous/sequentialized versions is doable, but in my view is tricky/nontrivial [1,2,3]. The relevant ideas are, arguably, straightforward, and old/well-known [4,5], but I view their formalizations as subtle. I think Giuliano's post [7] presents another good example of this type of reasoning, again based on some amount of intuition to justify a type of reduction/sequentialization.


[1] https://dl.acm.org/doi/10.1145/3385412.3385980
[2] https://www.di.ens.fr/~cezarad/cav19.pdf
[3] https://members.loria.fr/SMerz/papers/rp2009.pdf
[4] https://dl.acm.org/doi/10.1145/361227.361234
[5] https://core.ac.uk/download/pdf/82311765.pdf
[6] https://groups.google.com/g/raft-dev/c/t4xj6dJTP6E/m/d2D9LrWRza8J
[7] https://www.losa.fr/blog/streamlet-in-tla+
[8] https://dl.acm.org/doi/abs/10.1145/3497775.3503688

On Friday, February 23, 2024 at 2:26:40 AM UTC-5 divyanshu ranjan wrote:
Hi Jack,

Lamport has written a paper when such an assumption is valid one in terms of five-six conditions.

[1] https://lamport.azurewebsites.net/pubs/lamport-theorem.pdf

On Fri, Feb 23, 2024 at 12:41 PM Jack Vanlightly <vanli...@xxxxxxxxx> wrote:
Hi all,

I was going over the MongoDB logless reconfiguration spec today (https://github.com/will62794/logless-reconfig) and I saw how much more compact it was compared to my own various specs of Raft-based systems with reconfiguration. The principle reason is that this spec does not model message passing where send and receive are separate actions,  communications between nodes is atomic. I have seen a few other specifications use atomic communication as well.

For my part, I have always modeled send/receive message passing as it seemed possible to miss certain edge cases when communication was atomic. However, message passing does make specifications larger, more complex with a much larger state space so there is a real cost.

I'd love to hear others opinions on whether to model send/receive message passing or use atomic comms in distributed systems. 


You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/6c76fd7c-6355-48ae-9e0a-480ff8d4a797n%40googlegroups.com.

You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/3efd54ac-0280-4355-972a-ab331caa4227n%40googlegroups.com.

You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/CAEJcUcEn5VZgEuSSLfoXWOp6Q5iYOixfET4vF70H4PKweFkRMg%40mail.gmail.com.