Re: Differences between TLA+ Specification and Property Based Testing

Hi Andrew,

I like to think I'm both reasonably familiar with both Hypothesis and TLA+, although TLA+ moreso, so hopefully I can speak to some of your questions. I think the main thing to understand here is that the diehard problem is (in my opinion) not quite the problem domain for Hypothesis.

Hypothesis is a kind of one level above standard unit testing. Instead of testing that an input matches an output, you test that for a range of inputs, all of the outputs have a property. For example, given imagine we had a function badsort that sorts a list.

def badsort(array):

if not array:

return []

lesser = badsort([x for x in array if x < array[0]])

greater = badsort([x for x in array if x > array[0]])

return lesser + [array[0]] + greater

We could write some unit tests that happily pass:

assert badsort([]) == []

assert badsort([1,3,2]) == [1,2,3]

When we use property-based testing, though, we ask "what are properties of the outputs that are true for all inputs?" One of them, for example, is that sorting a sorted list returns the original list. Another is that a sorted list has the same size as the original list. Here's how we'd write them in Hypothesis:

from hypothesis import given, assume

import hypothesis.strategies as st

int_list = st.lists(st.integers())

@given(int_list)

def test_idempotent(numbers):

assert badsort(badsort(numbers)) == badsort(numbers)

@given(int_list)

def test_preserves_length(numbers):

assert len(badsort(numbers)) == len(numbers)

The first property test passes, but the second test finds a counterexample: our invariant fails for [0, 0]! So the property found a bug in our code which corresponded to an edge case we didn't realize we should check.

On the other hand, TLA+ is a specification language. It's designed for building a model of your system outside of any particular code. You can't directly test your function with it, but you can test the algorithm your function implements. It's a little harder to give a toy example that clearly demonstrates why TLA+ is so powerful, but if you're interested I wrote an essay on how I've used it in production. And even that example is just at the PlusCal level; you can do much cooler stuff than I've done.

Hope that helps! I'd personally recommend using both TLA+ and PBT, because they're both really good at what they do.

Hillel

On Wednesday, 12 April 2017 04:04:03 UTC-5, Andrew Gwozdziewycz wrote:

Hi Folks,

First time poster, and newbie^1 to both TLA+ and property based testing (PBT from here on out), generally. Recently, a blog post was published (http://nchammas.com/writing/how-not-to-die-hard-with-hypothesis) about solving the Die Hard 3 Jug problem in Hypothesis (http://hypothesis.works)--a PBT library for Python. The fact that this was so straightforward was rather shocking to me, I have to admit!

It seems to me that the main advantage TLA+ / Pluscal has over PBT is the high-level mathematical pseudocode... But a few questions about this assumption:

1. The Jug implementation in Hypothesis shows that for *some* specifications, TLA+ and PBT are equivalent in what can be expressed / specified / assured. Is this *generally* true?

2. I'm thinking that the concurrency support in TLA+ might also be a huge advantage, but I think it's likely possible to simulate this in PBT, as well. Is there anything special about TLA+'s concurrency modeling that couldn't be replicated via PBT?

3. Generally, why should someone choose TLA+ over PBT? What other advantages does TLA+ have over PBT?

4. I think it's the case that intuition on creating invariants and properties for specifications / PBT tests, comes with experience. But, what tips do y'all have for recognizing (learning to intuit) that these types of specifications can have meaningful impact on code that I'm writing every day (mostly around distributed systems monitoring, REST APIs)?

Thanks for your time!

Andrew "Overjoyed by the potential of this stuff to make everything I touch better" Gwozdziewycz

[^1]: Meaning, I've read the AWS paper, watched the Leslie Lamport lecture series (1-3), the DrTLA Paxos lecture, and read through some specifications in the tlaplus github repo. For both TLA+, and PBT, I have some clarity in how I'd use it for testing certain types of algorithms, but have no clue yet how to apply either to work I generally do.