Hi Jack,
I read the MongoRaftReconfig (https://arxiv.org/pdf/2102.11960.pdf), and I like this idea. I think this paper is about designing a new protocol and high-level abstraction is preferred and necessary to polish thought. I also agree with you that the asynchronous receive/send style is favorable from the engineering aspect,
The consistency of the spec and TLA+ has value in software engineering; all systems design must ultimately be implemented in source code.
In my view, the appropriate approach is to abstract from a higher level when designing a new protocol, which is more scalable and straightforward. However, when implementing an already well-designed protocol, it is better to implement it as an asynchronous send/receive approach to close it to implementation.
I have some unproven ideas about changing MongoRaftReconfig specification from atomic to asynchronous communication. In the atomic communication style, we assume the system has some global states that can be checked. However, real-world systems obtain these global states through unreliable communication, such as send/receive.
For example, the committed variable stores the committed log index and term. Updating this variable corresponds to the AdvancedCommitIndex in the original Raft. The commit index is implemented by receiving confirmation from followers. Some variables need to be added to transform it to asynchronous.
The atomic communication version of the config variable is divided into two variables:
configCommitted, the current configuration of the cluster application;
configNew, the new configuration to be changed.
Thus, ConfigQuorumCheck, and TermQuorumCheck in spec (corresponding to Q1 and Q2 in the paper), the leader must keep additional variables:
followerConfig, the configCommitted status of other nodes collected by the leader;
followerTerm, the currentTerm status of other nodes collected by the leader.
The committed variable will be removed, and the confirmation of each AppendLogEntries guarantees the OplogCommitment in spec (corresponding to P1 in Paper) property. The AppendLog message flow may be added with additional version and term values of the config. QuorumsOverlap also requires similar processing, such as checking the config version and term when invoking AppendLog and RequestVote RPC.
It is trivial.
I also want to share my thoughts about using level abstraction to deal with scale by staged model check. First, implement high-level abstractions, such as assuming atomic communication, and use them to generate a relatively small state space. Then, lower-level abstractions will be implemented, with send and receive to describe messages. It is best to make the lower-level abstractions a refinement of the high-level abstractions, although I have little experience ensuring perfect and precise refinement. The states generated by the high-level abstractions will become the initial states of the lower-level abstractions, and then model checking will be done in stages. For example, Raft is divided into AppendLog and LeaderElection. We first generate the initial state from the high-level abstraction, plus the AppendLog operator, and perform model checking. Then, we perform model checking in the next stage by using the initial state from the high-level, plus the LeaderElection operator. I do this stage approach only through intuition. IPA(https://arxiv.org/pdf/2202.11385.pdf) may provide the tools to get it more rigorous.
Guo Hua
Hi all,I was going over the MongoDB logless reconfiguration spec today (https://github.com/will62794/logless-reconfig) and I saw how much more compact it was compared to my own various specs of Raft-based systems with reconfiguration. The principle reason is that this spec does not model message passing where send and receive are separate actions, communications between nodes is atomic. I have seen a few other specifications use atomic communication as well.For my part, I have always modeled send/receive message passing as it seemed possible to miss certain edge cases when communication was atomic. However, message passing does make specifications larger, more complex with a much larger state space so there is a real cost.I'd love to hear others opinions on whether to model send/receive message passing or use atomic comms in distributed systems.ThanksJack