[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Any examples of a specification of an S3-like object store API?



Hi Steve, all good questions.

1. Defining now()

Monotonically-increasing variables make model-checking difficult, because absent restrictions this means the system has an infinite number of possible states and so the model checker will never halt. My advice here is to think carefully about whether a monotonically-increasing global clock is really required for you to achieve the desired semantics. If you absolutely need the global clock, define it thus:

VARIABLE GlobalClock

TypeInvariant == GlobalClock \in Nat

Init == GlobalClock = 0

Tick == GlobalClock' = GlobalClock + 1


Model-checking this is difficult. You'll have to define Nat to be a finite set like 0 .. 100, which means there exist behaviors which probably don't conform to the real world system - like an uninterrupted series of Tick relations until you hit GlobalClock = 100, after which other actions take place. You can mitigate this by not allowing Tick if other actions are enabled, but really this whole approach is messy and you're better off without a monotonically-increasing global clock. Why do you need one?

2. Specifying postconditions

It's best to view primed variable "assignments" and postconditions as one and the same. This is tricky to understand, but when you have an action like this:

ExampleAction ==
/\ x \in {0, 1}
/\ x' = 1 - x

you aren't saying "if x is 0 or 1, then assign 1 - x to x after this action executes". You're saying "the ExampleAction formula is true of a step in a behavior if in the first state x is either 0 or 1, and in the second state x' = 1 - x". This isn't pedantry; moving from the state machine paradigm to the behavior paradigm is critical for understanding the temporal logic component of TLA+. The purpose of a spec is to define a set of correct system behaviors, where a behavior is a sequence of states and a state is an assignment of values to variables. So when you see this:

Init == x = 0

Spec == Init /\ [][ExampleAction]_<<x>>


You understand that spec defines a subset of all behaviors such that Init is true in the first state, and for all steps either the ExampleAction boolean formula is true or it is a stuttering step.

How does this all relate back to your question? Simply, a postcondition of the type you've given is perfectly fine. You can use primed variables in whatever logical statements and checks you want and it will not create a new outcome. What adding conjuncts can do is remove outcomes. For example, if we have the following:

ExampleAction ==
/\ x \in {0, 1}
/\ x' = 1 - x
/\ x' \in {2, 3, 4}

Here we have the "postcondition" check that x' is in the set {2, 3, 4}. It's impossible for ExampleAction to be true of any step, since the conjuncts are clearly contradictory and can never all be true. Thus this step can never be taken. Tying this back to your example, we have these conjuncts:

/\ store' = [p \in (DOMAIN store \ path) |-> store[p]]
/\ ~has_entry(store', path)  \* HERE

If the first conjunct is true (which will probably always be the case) and the second conjunct (your postcondition) is not, TLC will simply not take this step. Personally I would use the first conjunct as the postcondition in and of itself.

A minor stylistic note: rather than modifying the domain of your store variable, consider mapping undefined paths to a placeholder null value instead.

3. Invariants

I'm not sure I understand this question. Could you specify what doGet and doHead do, and what it means for them to be consistent?

Andrew

On Wednesday, July 13, 2016 at 5:22:56 AM UTC-7, Steve Loughran wrote:

I'm looking for some example specifications of an eventually consistent object store, such as amazon S3.

This isn't because I plan to implement one, it's because I want to do some things against such an object store, specifically using a consistent database to address the inconsistency problems (similar to Netflix's s3mper - http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html ), then implement an O(1) output committer for hadoop/Tez/Spark which handles race conditions in multiple (speculative) competing executors and is resilient to failures —precisely the features that you don't get when working with S3 today. And of course, to show that such a mechanism works, a bit of formality can only help.

I know AWS and perhaps the Azure team have been using TLA+ internally: are there any specifications of the exposed behaviours of the S3 & Azure Data Lake stores around? Something from the service owners themselves would be best. Otherwise: has anyone done some examples of object stores with create consistency, queued metadata updates for object listings, asynchronous delete/update operations

For extra fun, S3 appears to briefly cache negative lookups, so that while the creation sequence always holds

~ exists(path)
PUT(path)
GET(path)

an initial GET could leave a negative result which the next GET would retrieve, so the following sequence is not guaranteed to be true, at least if there is "not enough" delay between the PUT and the subsequent GET.

~GET (path)
PUT(path)
GET(path)

While I don't want to go near that problem, it exists —so I'd better write it down and code for it.

Right now I'm not even sure how best to define that "eventually" concept except to say after some time t the observed state of updated/deleted objects will change, what the values of t are for different infrastructure instances


-Steve