[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlaplus] Re: Send/receive message passing or atomic communication in distributed systems



Hello Jack,

You mentioned above that you worked with the new Kafka  consumer group protocol from KIP 848 [1], developing a specification for it. There is a similar "schema registry protocol" that is used to select leader node in Confluent schema registry; they are both based on top of the old group membership protocol so I wonder if there is a publicly available specification for the schema registry protocol? As far as I looked there is none so I'm looking towards writing my own with the purpose of making a talk out of it, I wonder how hard can it be. 

Any advice would be helpful, thanks in advance

[1] KIP 848: https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol

Roman

среда, 10 апреля 2024 г. в 11:04:55 UTC+3, Guo Hua:

Hi Jack,


I read the MongoRaftReconfig (https://arxiv.org/pdf/2102.11960.pdf),  and I like this idea. I think this paper is about designing a new protocol and high-level abstraction is preferred and necessary to polish thought. I also agree with you that the asynchronous receive/send style is favorable from the engineering aspect,

The consistency of the spec and TLA+ has value in software engineering; all systems design must ultimately be implemented in source code.

In my view, the appropriate approach is to abstract from a higher level when designing a new protocol, which is more scalable and straightforward. However, when implementing an already well-designed protocol, it is better to implement it as an asynchronous send/receive approach to close it to implementation.


I have some unproven ideas about changing MongoRaftReconfig specification from atomic to asynchronous communication. In the atomic communication style, we assume the system has some global states that can be checked. However, real-world systems obtain these global states through unreliable communication, such as send/receive.


For example, the committed variable stores the committed log index and term. Updating this variable corresponds to the AdvancedCommitIndex in the original Raft. The commit index is implemented by receiving confirmation from followers. Some variables need to be added to transform it to asynchronous.

The atomic communication version of the config variable is divided into two variables:

configCommitted, the current configuration of the cluster application;

configNew, the new configuration to be changed.


Thus, ConfigQuorumCheck, and TermQuorumCheck in spec (corresponding to Q1 and Q2 in the paper), the leader must keep additional variables:

followerConfig, the configCommitted status of other nodes collected by the leader;

followerTerm, the currentTerm status of other nodes collected by the leader.

The committed variable will be removed, and the confirmation of each AppendLogEntries guarantees the OplogCommitment in spec (corresponding to P1 in Paper) property. The AppendLog message flow may be added with additional version and term values of the config. QuorumsOverlap also requires similar processing, such as checking the config version and term when invoking AppendLog and RequestVote RPC. 

It is trivial.


I also want to share my thoughts about using level abstraction to deal with scale by staged model check. First, implement high-level abstractions, such as assuming atomic communication, and use them to generate a relatively small state space. Then, lower-level abstractions will be implemented, with send and receive to describe messages. It is best to make the lower-level abstractions a refinement of the high-level abstractions, although I have little experience ensuring perfect and precise refinement. The states generated by the high-level abstractions will become the initial states of the lower-level abstractions, and then model checking will be done in stages. For example, Raft is divided into AppendLog and LeaderElection. We first generate the initial state from the high-level abstraction, plus the AppendLog operator, and perform model checking. Then, we perform model checking in the next stage by using the initial state from the high-level, plus the LeaderElection operator. I do this stage approach only through intuition. IPA(https://arxiv.org/pdf/2202.11385.pdf) may provide the tools to get it more rigorous.



Guo Hua



On Friday, February 23, 2024 at 3:11:49 PM UTC+8 Jack Vanlightly wrote:
Hi all,

I was going over the MongoDB logless reconfiguration spec today (https://github.com/will62794/logless-reconfig) and I saw how much more compact it was compared to my own various specs of Raft-based systems with reconfiguration. The principle reason is that this spec does not model message passing where send and receive are separate actions,  communications between nodes is atomic. I have seen a few other specifications use atomic communication as well.

For my part, I have always modeled send/receive message passing as it seemed possible to miss certain edge cases when communication was atomic. However, message passing does make specifications larger, more complex with a much larger state space so there is a real cost.

I'd love to hear others opinions on whether to model send/receive message passing or use atomic comms in distributed systems. 

Thanks
Jack


--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/ba6d4d59-6bb6-4130-b03e-87a6f6c0a91en%40googlegroups.com.