Re: [tlaplus] Leader backs up followers quickly with persistance

On Friday, March 20, 2026 at 8:15:21 PM UTC+2 Andrew Helwer wrote:

I think TLA+ would help narrow the problem down as long as you can accurately model all the things that might be causing this problem. This can be done incrementally, by adding details and subsystems to your spec until behaviour that plausibly matches your failure is found.

Andrew

On Fri, Mar 20, 2026 at 4:25 AM Saif Mohamed <saifelm...@xxxxxxxxx> wrote:
Hi Andrew, thanks for your interest,

I don't think that is the problem because there is no replica (follower) that could confirm (ack to the leader) that a log entry has been replicated on its local state before it persists this entry on its disk (syncs it). Based on that, there are no partial answers for replicating a log entry—as long as the confirmation is received by the leader, the log entry is persisted.

It could be the case that (and this just popped into my mind as I'm writing this message) the RPC ack message got delayed in the network, so the leader had to retransmit this log entry message again. The follower is up to date (does nothing), sends an ack, which gets delayed again, and so on. That makes the follower retransmit the ack message for the same log entry, for example, 100 times, but ack message number 5 reaches the leader, so the leader moves forward assuming 100 entries were successfully transmitted. Then, at entry number 101, the leader receives a delayed message from the past (message number 6), which makes the leader revert to a past state for that follower.

This is my mental environment:

This problem is kind of a deadlock, because the test cases are very randomized and sometimes pass and sometimes not (no agreement reached).

It could be that the timer timeout is not big enough to give the follower the ability to fully back up.

Could TLA+ help narrow down this problem?

Thanks.

On Wednesday, March 18, 2026 at 7:32:08 PM UTC+2 Andrew Helwer wrote:
Hi Saif,

Trying to understand - you're saying that your replicas can die in between confirming/choosing a value and writing it to disk, and so this leads to chosen values being lost to the cluster since the non-persisted values are gone when they come back up?

Andrew

On Mon, Mar 16, 2026 at 9:48 PM Saif Mohamed <saifelm...@xxxxxxxxx> wrote:
Currently I'm trying to build a layer of distributed state machine with Raft following this paper: http://nil.csail.mit.edu/6.5840/2025/papers/raft-extended.pdf
Course: https://pdos.csail.mit.edu/6.824/

The problem appeared when I introduced the persistence layer (disk) to my log replication logic. There are situations where agreement on a specific entry cannot be reached because of time limit exceeded (a time-frame the entry should be replicated inside of).

Mainly, when there is a heavy randomized leader failure, the followers don't get the chance to fully back up their log. This problem didn't appear previously without persistence in the picture.

My question is: does the TLA+ tool help me identify the true problem in case I'm already identifying it wrong?

Any thoughts or recommendations will be helpful, thanks.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/1b9fd2b0-45c0-43e1-8ab7-cb1b220f0b08n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@xxxxxxxxxxxxxxxx.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/43784214-36db-4acd-b4c5-792b920652e8n%40googlegroups.com.