[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] TLA+ logic

  >>As this implies, Coq is not simple.  Chris Newcombe will tell you 
  >>that one can't expect an engineer to deal with its complexity.  

In the paper "Why Amazon Chose TLA+" [0], I wrote:

  "Engineers already have their hands full with the complexity of the problem they 
  are trying to solve. To help them rather than hinder, a new engineering method 
  must be relatively easy to learn and easy to apply. 
  "Methods such as Coq and PVS involve very complicated type-systems and proof
languages. We found these concepts to be difficult to learn. 
  "We found that Coq, PVS, and other tools based solely on interactive proof
assistants, are too complicated to be practical for our combination of problem
domain and time constraints."

In that paper I also wrote:

   "We did not find any relevant examples of Coq being used in industry."

Since that paper was written, Coq has been used to prove a safety property (linearizability) of the Raft algorithm for replicated state machines [6], which is (a simplified version of) an algorithm that is used in industry.  This proof was not done directly in Coq; it was done in a language and tool built on top of Coq, called Verdi [1].  Verdi is described "as a framework for implementing and formally verifying distributed systems in Coq" [2].  

Verdi is designed for message-passing algorithms. From the few small examples I've looked at [2], Verdi does appear to be less complex than using Coq directly for that class of problem, at least for the specification part. The main improvement (in my opinion) over plain Coq is that Verdi's conceptual model for specification is quite similar to that of TLA+. In Verdi, a system is specified "by describing the possible “steps” that a system can take during execution.  ...  The possible steps are defined formally as a relation on “worlds”. A world is a snapshot of a system’s execution, and consists of the local state of every node of the system, the set of packets that are in flight, and the trace of external events." [5].

Some other parts of the overall Verdi method are also similar to the TLA+ method.  For instance, proofs of safety properties in Verdi require finding an "inductive state invariant" ([2], section 2.3] 

However, Verdi differs significantly from TLA+ in several ways:

  a. Verdi doesn't yet support specification of fairness, or verification of liveness properties.

  b. Safety is specified as a predicate on traces (behaviors), not a state invariant.
       Example from [2] section 2.1:
"This mutual exclusion property can be expressed as a predicate over traces:

  mutex(t) :=    t = t1 ++ <n1, Grant> ++ t2 ++ <n2, Grant> ++ t3  =>  <n1, Unlock> \in t2

To hold on trace t, the mutex predicate requires that whenever Grant
is output on node n1 and then later Grant is output on node n2, there
must first be an intervening Unlock input from n1 releasing the lock."
   c. Verdi doesn't have a model checker, so all verification is done via proofs in Coq.  In my experience, this will make Verdi impractical for all but the most safety-critical algorithms in industry, unless there has been a major breakthrough reduction in the amount of work and complexity involved in writing proofs.  As far as I can tell, there hasn't been a breakthrough yet; see next two points (d, e).  However, Verdi does include a possibly significant improvement in "automatic refinement" that (the authors claim) reduces the work required to specify and prove systems in less reliable environments, and which therefore might help with industrial systems; see point (f) below.

   d. The proof methods are somewhat different.  From section 2.3 of the Verdi paper[2]:
"The proof that mutex holds on all traces of the lock service application consists of three high-level steps: 
   (1) prove an invariant about the reachable node and network states of the lock service application, 
   (2) relate these reachable states to the producible traces, and 
   (3) show that the previous two steps imply mutex holds on all producible traces"
      The paper goes on to describe how part #1 requires finding a state predicate (called mutex_state in the paper), which to me appears to be a conventional safety property, i.e. an invariant of reachable states.  To prove that invariant, the method requires finding an "inductive state invariant".  So part #1 appears to match the main idea of safety proofs in TLA+. But part #2 and #3 are different - at least superficially - to how things are done in TLA+. See the paper for description and example of part #2 and #3 (for brevity I won't quote those sections here).

      I'm not qualified to comment on whether the proof method in Verdi is novel, or whether the differences with the TLA+ proof method are an improvement or a re-casting. As Verdi is based on Coq, I assume that Verdi is sound and relatively complete.  

      I have doubts about how practical the Verdi method is for complex systems (i.e. systems more complex than basic Raft), because Verdi's use of traces (behaviors) to specify correctness properties, to model state ('worlds' always include traces of past events and messages), and in proofs (parts 2 and 3 of the method) appear to introduce a dimension of behavioral reasoning.  Caveat: this is merely a suspicion on my part, not based on any solid evidence.  But if it is the case then, I also suspect that the method would have the problem of combinatorial explosion of cases, due to the many possible permutations and interleavings of events allowed by any real-world specification.  Perhaps Verdi solves that problem, but I did not find it mentioned in the paper [2].

      For this reason, I still prefer the more direct TLA+ method of proof solely by inductive invariance (occasionally using an explicit history variable if necessary).  However, my preference doesn't actually get me anywhere, as I've never successfully proven the correctness of a real industrial system in TLAPS. The main reason I've failed is that it's too difficult and time-consuming for me, and for all software engineers in industry that I've discussed this with, to find the inductive invariant for a real-world system. That central problem appears to be common to both TLA+ and Verdi.

      I don't know whether Verdi/Coq has useful tools to help incrementally find ('debug') an inductive invariant.  This is an area in which TLA+ does need better tools. In TLA+ the standard method is to ask TLC to model-check  Inv /\ Next => Inv'  but TLC usually can't do that for a complex spec, e.g. from industry.  The next best option that I'm aware of is to use the TLA+-to-ProB translator and the ProB constraint solver, but I haven't tried that approach on an industry spec. (Last time I checked, the ProB translator supported a large subset of TLA+ but not the whole language, which is entirely understandable as ProB is typed and TLA+ is not.)  The next best option is to manually convert the spec to Alloy (as the Alloy Analyzer can check inductive invariance), but Alloy is far less expressive than TLA+.  The best option for the future might be the unfinished symbolic model checker for TLA+ that was never released. But someone would have to finish it; a huge project.

  e. Even after an inductive invariant is found, proofs in Verdi still appear to require a lot of expertise with the Coq proof system and manual effort.  I may be wrong about this, as I haven't tried it myself.  My current view is based on the following information:

    i. Proofs in Verdi are fairly large; I'd estimate at least as large as TLAPS proofs of the same system (when using the SMT backend).  
       E.g. [2] reports:
         - 500 lines to prove safety (mutual exclusion) for the 40-line spec of the (very) simple lock server specified in [2].
         - 4144 lines to prove safety (linearizability) of the 170-line spec of basic Raft algorithm for replicated state machine. (I call this the 'basic' algorithm as the spec did not include important and infamously non-trivial features required in industry, such as log compaction and dynamic group membership.)

    ii. Seven people worked on the proof of basic Raft.  The people were the designers of Verdi, so presumably had significant prior experience using Coq.  I can't find any description of how long the proof took in person-hours.

    iii. To me (a non-user of Coq) the proof language still looks extremely complicated.  See the many files in [5].  The proof is much harder to understand than a TLAPS proof (after reading the appropriate short section of the hyperbook on the TLAPS proof language).  Some reasons:
            - The Verdi proof lacks the hierarchical structure that TLAPS uses to manage complexity.  Instead it uses many lemmas, which I find harder to follow.
            - The Verdi proof justifications are invocations of Coq heuristics that are very hard to understand.
         The total lack of explanatory comments doesn't help.

  f.  Verdi has support for a limited form of 'automated refinement' that TLA+ does not have.  I find this to be the most interesting part of Verdi. From [2]:
"Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden"
      From the paper, the current automated refinements include i) adding sequence numbers to network messages to eliminate concerns about message loss, duplication and reordering, and ii) using replicated state machines to implement fault tolerant processes. These are a good start, but in real industrial systems such mechanisms often have to co-operate in order to implement a higher-level correctness property (e.g. transaction isolation in a database), so it remains to be seen how far this idea will go in practice. If it becomes sufficiently flexible then it could be very valuable.

  g. Verdi generates executable code in OCAML, which according to results in [2] appears to be fairly efficient for simple benchmarks.  This is obviously a benefit.  However, in order to be used in a real industrial system the code will need to be modified in many ways (security, operational visibility and control, support online deployment & rollback of changes to the code, multi-core scalability etc. etc), so the question arises of re-verification in the presence of constant maintenance.  Still, it's a very welcome direction.

So Verdi is an interesting approach for message passing algorithms, and is particularly impressive because of (f) and (g) above.  But I believe that TLA+ is a significantly better method for use in industry, because TLA+ addresses a much larger class of problems (in addition to message-passing algorithms I've used TLA+ for shared-state concurrent algorithms, sequential algorithms, conceptual modeling & data modeling), and because of the other factors mentioned above and in [0].


[0] "Why Amazon Chose TLA+"  http://research.microsoft.com/en-us/um/people/lamport/tla/amazon.html

[1] Verdi's website: http://verdi.uwplse.org/   

[2] The Verdi paper:  http://verdi.uwplse.org/verdi.pdf

[3] An introduction to Verdi by one of its designers: http://homes.cs.washington.edu/~jrw12/network-semantics.html

[4] A comment from a designer of Verdi:  https://news.ycombinator.com/item?id=10017549

[5] https://github.com/uwplse/verdi/tree/master/raft-proofs 
    and https://github.com/uwplse/verdi/tree/master/raft 

[6] https://raft.github.io/

On Fri, Dec 4, 2015 at 10:11 AM, Leslie Lamport wrote:
Here is my view, based on almost complete ignorance of the
mathematical foundations and a lot of practical experience using math.
I'm told that second-order logic is more powerful--for one reason,
because it can distinguish between standard and non-standard models of
the integers and first-order logic can't.  I have never found this to
be a practical issue--even in pure math.  I'm certainly not worried
about a bug occurring because an engineer implemented a non-standard
model of the integers.

I don't know Coq, but I'm sure that it's expressive enough
for all practical purposes.  There's no good reason to write a set
like {1, {2,3}}, and making it impossible to write seems like a good
idea.  However, I've found that there is no simple way to make it
impossible to write that set without also making it impossible to
write useful sets.  As this implies, Coq is not simple.  Chris
Newcombe will tell you that one can't expect an engineer to deal with
its complexity.  That doesn't meant that there's anything wrong with
Coq; it just means that it's not meant for ordinary engineers.  For
example, if you look at a math text, you might find that the symbol
+ is used in a single paragraph to mean several different things.
To formalize that math in TLA+, you'd have to use a different symbol
for each of those different meanings.  That would drive a
mathematician crazy.  Coq allows you to use the same symbol for all of
them.  So, as George Gonthier will tell you, you need something like
Coq for formalizing serious math.  Since system builders and algorithm
designers don't use that kind of math, they don't need to deal with
the complexity of a language like Coq.