[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Send/receive message passing or atomic communication in distributed systems



Jack,

The abstraction level of those specs was influenced both by our intuition and by pragmatic concerns. Overall, I would say that our work was originally driven by a desire to come up with the most abstract model of the underlying algorithm that was also "useful" from an engineering standpoint. In our context, that meant having a model that gave us a good understanding of the abstract behavior of the algorithm, and one for which model checking was mostly feasible for small/medium sized protocol instances, and also efficient enough to provide prompt feedback while iterating on candidate designs.

More concretely, I think our approach was also influenced by the fact that many of the bug scenarios described in the original Raft reconfig algorithm [6] can essentially be represented even in this abstract model. So, we felt it was a good starting place to express our protocol at this abstraction level so that we could clearly understand these types of bugs at a high level, and work towards a design that would avoid similar issues in MongoDB's system/protocol.

As a general design methodology, I think this is a relatively effective/efficient approach, especially when you are using a spec to iterate on the design of a new protocol (as was the case in our situation). It is roughly analogous to a kind of refinement driven approach to modeling and implementation i.e. designing a protocol at increasingly lower levels of abstraction until you get what you want. In our case, we didn't really do any explicit refinement steps, but I think it was still very useful to execute most of our design work at a "well chosen" level of the abstraction hierarchy first, before moving to levels that are more complex and/or may make model checking infeasible.

More broadly, whenever doing these kinds of formal modeling efforts, there are inevitably cost/benefit tradeoffs to be made, which are presumably influenced by what your ultimate goals are. If your goal is a truly "formally verified" protocol, then sure, you probably don't want to make any informal/unjustified reasoning steps in your protocol model. But, as I'm sure you understand well, having a fully faithful model will likely also blow up the state space drastically. Moreover, if you consider fully formal safety proofs of these kinds of protocols, even for our relatively abstract model, we spent several (4-6) human months deriving an inductive invariant and carrying out formal proofs in TLAPS [8]. I expect a similar effort for a fully asynchronous, message passing model would be significantly more time consuming.  

As an additional side note, formalizing the intuition around how to reduce asynchronous message protocols to equivalent synchronous/sequentialized versions is doable, but in my view is tricky/nontrivial [1,2,3]. The relevant ideas are, arguably, straightforward, and old/well-known [4,5], but I view their formalizations as subtle. I think Giuliano's post [7] presents another good example of this type of reasoning, again based on some amount of intuition to justify a type of reduction/sequentialization.

Will

[1] https://dl.acm.org/doi/10.1145/3385412.3385980
[2] https://www.di.ens.fr/~cezarad/cav19.pdf
[3] https://members.loria.fr/SMerz/papers/rp2009.pdf
[4] https://dl.acm.org/doi/10.1145/361227.361234
[5] https://core.ac.uk/download/pdf/82311765.pdf
[6] https://groups.google.com/g/raft-dev/c/t4xj6dJTP6E/m/d2D9LrWRza8J
[7] https://www.losa.fr/blog/streamlet-in-tla+
[8] https://dl.acm.org/doi/abs/10.1145/3497775.3503688
On Friday, February 23, 2024 at 2:26:40 AM UTC-5 divyanshu ranjan wrote:
Hi Jack,

Lamport has written a paper when such an assumption is valid one in terms of five-six conditions.

[1] https://lamport.azurewebsites.net/pubs/lamport-theorem.pdf

On Fri, Feb 23, 2024 at 12:41 PM Jack Vanlightly <vanli...@xxxxxxxxx> wrote:
Hi all,

I was going over the MongoDB logless reconfiguration spec today (https://github.com/will62794/logless-reconfig) and I saw how much more compact it was compared to my own various specs of Raft-based systems with reconfiguration. The principle reason is that this spec does not model message passing where send and receive are separate actions,  communications between nodes is atomic. I have seen a few other specifications use atomic communication as well.

For my part, I have always modeled send/receive message passing as it seemed possible to miss certain edge cases when communication was atomic. However, message passing does make specifications larger, more complex with a much larger state space so there is a real cost.

I'd love to hear others opinions on whether to model send/receive message passing or use atomic comms in distributed systems. 

Thanks
Jack


--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/6c76fd7c-6355-48ae-9e0a-480ff8d4a797n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/3a2965fc-9e24-4c32-840f-8af3e54a86a1n%40googlegroups.com.