[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlaplus] Re: Reusing Specs



Hi, Lamport reading your post made me think about some specifications that a friend and I wrote some months ago.

Our goal was to write a specification for a state machine replication (RSM) protocol that uses multiple instances of
the "Paxos-family" protocols to decide on each entry of a replicated log, and we would like to have a way to test it
on various failures scenarios (because we were learning about these protocols).

In that time we also wanted to learn more about TLA+, so we decided to start by getting your specification of Paxos
from the Github (Paxos.tla) and do a refinement mapping of it in our specification.

We noted that your Paxos specification implements a Voting specification which in turn implements the Consensus
spec. So we thought this was the natural way to proceed.

Then we copy all of these three specs, and we apply minor modifications to your original Paxos specification in
order to make it more modular for our needs, but trying to not change its logic. We didn't modify the Voting or the
Consensus spec.

We basically split the formula definition of some steps (like the phase1b bellow) in two different operators:
"condition" and "action", which allowed us to reuse these operators in the RSM spec and also overwriting it using
the configuration files (to tests slight modifications of some operators and simulate failures, for example).

----------------------------Paxos----------------------------------------
VARIABLE state
...
Init == state = [maxBal |-> [a \in Acceptor |-> -1],
                       maxVBal |-> [a \in Acceptor |-> -1],
                       maxVVal |-> [a \in Acceptor |-> None],
                       msgs |-> {}]
...
Phase1bCondition(a, m) == /\ m.type = "1a"
                                         /\ m.bal > state.maxBal[a]

Phase1bAction(a, m) ==
    [state EXCEPT !.maxBal[a] = m.bal,
                           !.msgs = Send([type |-> "1b",
                                                   acc |-> a,
                                                   bal |-> m.bal,
                                                   mbal |-> state.maxVBal[a],
                                                   mval |-> state.maxVVal[a]])]

Phase1b(a) == \E m \in state.msgs :
                                                      /\ Phase1bCondition(a, m)
                                                      /\ state' = Phase1bAction(a, m)
------------------------------------------------------------------------

---------------------------RSM-Paxos---------------------------
VARIABLE states,        \* Array of paxos states
                 log,           \* Array os decided values
                 proposed       \* All instances knows about the proposed values
...
P(i) == INSTANCE Paxos WITH Acceptor <- Replica,
                                                state    <- states[i]
...
Phase1b(i,r) == /\ \E m \in states[i].msgs : /\ P(i)!Phase1bCondition(r, m)
                                                               /\ states' = [states EXCEPT ![i] = P(i)!Phase1bAction(r, m)]
                /\ UNCHANGED <<log, proposed>>
------------------------------------------------------------------------

After played with crash-stop models, testing it using different configuration files and making assertions about the
invariances, we decided to move to the other failure models, crash-recovery and also Byzantine. We ended up
implementing a spec that extends our RSM and works as a "test-suite framework", declaring our test models that
are "instantiated" using the TLC configuration file.

For example, we use your byzantine paxos spec: Byzantine Paxos Refinement, and apply minor modifications to
our RSM spec to have a Byzantine version of it.

---------------------------RSM-BPConProof------------------------------------
BP(i) == INSTANCE BPConProof WITH Acceptor     <- CorrectReplica,
                                                           FakeAcceptor <- FakeReplica,
                                                           state        <- states[i]
...
Phase1b(i,r,b) == /\ BP(i)!Phase1bCondition(r,b)
                          /\ states' = [states EXCEPT ![i] = BP(i)!Phase1bAction(r,b)]
                          /\ UNCHANGED <<log, proposed, bal>>

Phase1c(i,b) == /\ \E S \in Possible1cMsgsIn(b):
                                               states' = [states EXCEPT ![i] = BP(i)!Phase1cAction(S)]
                        /\ UNCHANGED <<log, proposed, bal>>
...
-----------------------------------------------------------------------------

So reuse existing specification in this way shown very valuable for us in that time, and together with the papers,
helped us to understand better the protocols. But now, I'm wondering if our approach was completely wrong and
we misuse TLA+, TLC, and the toolbox. Because we kind of copy/paste/modify the original Paxos and ByzPaxos
specs, but we didn't modify the Voting and Consensus that still used internally by them.


Rodrigo

Em sexta-feira, 14 de junho de 2019 16:19:51 UTC+2, Leslie Lamport escreveu:
I have observed that many new users want to write TLA+ specs so they
can be reused.  I have one word of advice for those users: Don't.
The way to reuse a TLA+ spec is copy, paste, and modify.
This flies in the face of everything you've learned about writing
programs.  It does so for the same reason that TLA+ does not look like
any programming language: specifications are very different from
programs.

The most obvious difference between them is size.  I have the
impression that the specs written by engineers at Amazon and Microsoft
working on cloud systems are a few hundred lines long.  Other than one
outlier, the longest specs I know of were written by engineers at
Intel.  They were about 2000 lines long.  I was told that this was
because that was the largest spec a single person could understand.
If a spec started becoming longer than that, they would remove details
by abstraction to make it smaller.  Programming languages could be a
lot simpler if no one ever wrote a program longer than 2000 lines.

A more important difference appears when you consider what one reuses
when programming.  If you're writing a sorting program, you don't make
it reusable in the sense that you could write the program so that, by
changing the parameters, you could change it from executing quick sort
to executing heap sort.  Reusing means being able to write another
program that uses sorting as a primitive, without caring about how
it's implemented.  In a TLA+ spec of an algorithm that uses sorting,
that use might appear as a subformula like A' = Sort(A), where Sort
has a 3-line definition.  You hardly need a separate module for a 3-line
definition.  A reusable sorting program needs to pass in arguments
specifying where to find the sorting key and what the ordering
relation is.  In TLA+, you just copy a Sort definition used in one
spec and make some trivial modification to those three lines to
reflect the difference in the location of the sorting key and the
ordering relation.  This is easy enough, and it makes the spec a lot
simpler than if you imported a single definition that could be reused
without modification.

A Sort operator is actually one of the most complicated parts of a
TLA+ spec you're likely to reuse.  Consider message passing in a
distributed algorithm.  If you wrote a distributed program and had to
write your own code for sending and receiving messages, you would
certainly not want to have to rewrite it for your next distributed
program.  Now look at the spec of the Paxos consensus algorithm in the
TLA+ GitHub examples repository.  Here is the "procedure" that's used
to send an arbitrary  message m:

   Send(m)  ==  msg' = msg \cup {m}

It was hardly worth writing that definition just to be able to write
"Send(m)" instead of "msg' = msg \cup {m}" throughout the spec.  But I
thought it made the spec a little easier to read.  There isn't even
any explicit receive action.  The presence of a message to be received
is simply an enabling condition of an action.  What can be reused from
the spec is the idea of representing message passing with a variable
containing the set of all messages that have been sent.  Simple ideas
are expressed simply in TLA+; you don't need to reuse the "code" that
embodies them.

If you want to learn to write specs, write a spec of a particular
system or algorithm you're interested in.  Worry about getting that
system or algorithm right.  Don't think about what else you might want
to do with your spec.  That's what I do.  That's what engineers who use 
TLA+ to build real systems do.

Leslie
 

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To post to this group, send email to tlaplus@xxxxxxxxxxxxxxxx.
Visit this group at https://groups.google.com/group/tlaplus.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/ea0fba55-fef3-46f8-b72b-004c63b1e388%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.