[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlaplus] Yet more code conformance checking

From: Ognjen Maric <ognjen.maric@xxxxxxxxx>
Date: Wed, 18 Jun 2025 06:01:10 -0700 (PDT)

Hi folks,

I thought I'd post about some recent work I did on checking the correspondence between TLA models and code using trace validation (thanks to Jesse Jiryu Davis for the nudge to post this here!). I know that others have done similar things, but the motivation and the details of my approach and some code artifacts (a Rust logging library for TLA) may still be interesting.

We've used TLA relatively extensively to verify different parts of the blockchain stack (Internet Computer) that we're working on. TLA modeling went pretty well in the beginning. The process was that I'd normally create the model myself based on the code and then sit down with the engineers owning the code to check that the model and code match (both to verify any findings and to make sure I didn't miss any interesting behaviors by overabstracting).

One particularly fruitful area was checking for reentrancy bugs in smart contracts. Reentrancy bugs are a form of concurrency bugs that have historically been responsible for some of the highest-profile hacks in blockchains, and our highly concurrent blockchain makes these bugs more likely to happen (we recently published a paper explaining this, for anyone interested [1]). We found multiple such bugs, one of which, in our wrapped Bitcoin token, in particular was potentially worth millions if exploited (the nice thing about blockchains is that it can be easy to put a price tag on a bug).

But lady Fortuna wasn't quite in the mood. Literally a couple of weeks after we had found and fixed this bug and successfully verified the model, and I went through the model with the lead engineer, someone accidentally reintroduced a variation of the bug in the code when implementing a new feature. As I was done with the model by that point I didn't catch the change. Luckily we caught it last minute via a manual security review, but it left a bitter taste in the mouth and raised questions over the utility of the TLA exercise.

Since we don't really have resources to do code-level verification, I started looking for a cheaper solution that could still give us some hope of keeping models and code in sync. The idea of a runtime verification approach (aka trace validation) popped up. I looked around and found the MongoDB paper on extreme modeling, with the recommendation to not do it :) But our situation was different, as our models were built after the code and align with the code pretty closely (though not perfectly). So I started working on this on the side and have recently wrapped up the first project, where we check the traces of smart contracts governing the platform (e.g., deciding on software upgrades or token minting) against its TLA models.

We do this by instrumenting our smart contract code (written in Rust) with certain logging macros that, during testing, log the TLA traces in the background. The macros are only enabled in test builds, not in production. We then observe all the state changes exercised in tests, and, at the end of each test, we check the logs for conformance against the model. You can find the (slightly marketingy) writeup here:

https://medium.com/p/c7854eb9458d

The Rust logging library is available here:

https://github.com/dfinity/ic/tree/57a4ecdad8167ab17508a37195279bc2ce30b12d/rs/tla_instrumentation

A few technical points that may be interesting for this crowd:

1. I ended up using Apalache instead of TLC for the conformance check, even though I used TLC for the actual model checking. The main reason is that the models often have guards of the form `\E num \in 1..MY_CONSTANT`, where MY_CONSTANT would be very small to make TLC model checking feasible, but might be huge in practice. IIUC TLC would enumerate everything explicitly, whereas Apalache would turn this into an SMT formula that would be trivial to check (since the `num` is effectively constrained by the pre- or post- state). Using Apalache also had the side benefit that it comes with a type system. I know this is a bit controversial here ;) but in my experience it made it easier to update the models when the code changed compared to the "vanilla TLA" experience.

2. Instead of checking the entire trace, I ended up checking just that each individual recorded step conforms to the model's transition relation. This way I could ignore the "environment" steps that are required to obtain a full legal trace of the system, and just focus on the "system under test" instead. I also effectively ignored checking the initial states, as I just stipulate the initial state to be equal to the recorded pre-state. This was helpful for tests which would often start from some predefined fixture state, rather than a "clean" initial state. Obviously it's not really sound, but hey, the entire approach is just best-effort.

3. I also ended up with a quirky way to phrase the check of "does this recorded step conform to the transition relation". Given the transition relation `Next(x, y)` (explicitly parameterized here for clarity), and the recorded pre- and post- states s and s', I ended up creating a new model with `Init(x) == x = s` and `Next2(x, y) == y = s' /\ Next(x, y)`, and asking whether this new model deadlocks or not. If it does, then the step doesn't conform to the transition relation. I don't really remember anymore why I took this exact approach. I *think* I was worried about accidentally allowing stuttering where none was expected (i.e., I wanted to disallow having s = s' unless the Next relation really specifies that this is possible), but maybe there are better ways of handling this concern.

If anyone has any ideas on improving this, I'm happy to get any feedback. In particular, one moderate pain point of my current approach is that its slow, though it's just about bearable to still include this in the CI. The combination of points 2 and 3 above means that I end up with one new model (to ask the "deadlock question") for each state pair collected. So checking each pair incurs the JVM startup cost, Apalache pre-processing costs, etc. This could be mitigated by caching the state pairs and not rerunning the checks for cached state pairs, but I haven't gotten around to twisting Bazel (our build system) into submission yet.

Cheers,
Ognjen

[1] https://arxiv.org/abs/2506.05932

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion visit https://groups.google.com/d/msgid/tlaplus/858881a5-df2d-4a66-8b62-0171919ba2f0n%40googlegroups.com.

Follow-Ups:
- Re: [tlaplus] Yet more code conformance checking
  - From: A. Jesse Jiryu Davis

Prev by Date: Re: [tlaplus] Re: Monthly Development Update newsletter
Next by Date: [tlaplus] Regular LTL safety properties as Refinement
Previous by thread: Re: [tlaplus] My First TLA+ Specification
Next by thread: Re: [tlaplus] Yet more code conformance checking
Index(es):
- Date
- Thread