[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

getting started with tla+, are there bakery algo/election with leases examples I could follow?



To help me beginning learning how to translate algorithms into TLA+ for model checking,
I made up a little leased leader-election algorithm. I'll call it ALLCALL, and I'll write it out below.
ALLCALL supposes that there is a multicasting and point-to-point network available, and
that clock drift can be bounded.

My question is: how does one encode such an algorithm into TLA+ (or pluscal)?  If there is a similar encoding (e.g. of the Bakery algorithm or similar) available somewhere, perhaps that could get me started.
Thanks for any and all pointers to examples/guides on how to translate this.

I tried reading the Raft TLA+ spec (https://github.com/ongardie/raft.tla/blob/master/raft.tla), but I was completely lost with what those i and j subscripts mean...

- J

# the ALLCALL leased-leader election algorithm
# Jason E. Aten, February 2017

Let heartBeat = 1 sec.

Let leaseTime = 10 sec.

Let maxClockSkew = 1 sec, where maxClockSkew
is a bound on how far out of sync our local
clocks may drift.

Given: Let each server have a numeric integer rank, that is distinct
and unique to that server. If necessary an extremely long
true random number is used to break ties between server ranks, so
that we may assert, with probability 1, that all ranks are distinct integers,
and that each server, absent an in-force lease, can be put into a strict total order.

Rule: The lower the rank is preferred for being the leader.

Ordering of lease then rank: we order the pair (leaseExpires, rank) first
by largest leaseExpires time, then by lower rank. If
both leases have expired, then neither is considered as a part of the ordering.

ALLCALL Algorithm phases:

I. Init phase: when a server joins the cluster,
 it does not issue allcalls (a ping of all members)
 until after leaseTime + maxClockSkew time has elapsed.

 During init, the server does, however, accept and respond to allcalls() from
 other cluster members. The allcall() ping will contain
 the current (lease, leader-rank) and leader-id
 according to the issuer of the allcall(). Every
 recipient of the allcall updates her local information
 of who she thinks the leader is, so long as the
 received information is monotone in the (lease, leader-rank)
 ordering; so a later unexpired lease will replace an
 earlier unexpired lease, and if both are expired then
 the lower rank will replace the larger rank as winner
 of the current leader.

II. Ping phase: after a server has finished its Init phase, it
 enters its ping phase.

 During ping phase, the server continues to accept and respond
 to allcall requests from other servers. Now in addition,
 the server also issues its own allcall() pings every
 heartbeat seconds.


Properties of the allcall():

 The allcalls() are heard by all active cluster members, and
 contain the sender's computed result of who the current leader is,
 and replies answer back with the recipient's own rank and id. Each
 recipient of an allcall() replies only to the sender.
 So the sending of the allcall is a multicast (publish), but
 the replying to an allcall is a point-to-point operation.


Conjecture: assuming that communication is maintained between
servers, then for the set of live servers at any point in
time, there can be only zero or one leader at a time, from
the point of view of any node in the cluster; and if there is
a leader (rather than not knowing), then there is
agreement between all "knowing" nodes on who that leader is.

This last conjecture is intended to be strong, on purpose, as I would like
to find a counter-example, if one exists.