[tlaplus] Experience report: teaching TLA+ proofs to ChatGPT

Disclaimer: in this post I'll use the words "knows", "understands", "learns", "reads" etc. although that isn't quite accurate for what's happening here, but it works well enough.

I spent 1.5 hours trying to teach ChatGPT4 how to write TLA+ proofs this morning. If you fork out $20/month you can access ChatGPT4 with quite a large memory/token window, so you can get it to customize its understanding on top of its base-level understanding it presumably got from reading all the TLA+ specs on github. Thanks to Jesse Jiryu Davis for suggesting this teaching approach as opposed to just using pre-existing understanding. Report:

The base-level understanding of TLA+ and the model checker is quite good. For example, I was able to paste in a spec and ask ChatGPT4 to derive a type invariant. It came up with a pretty good one, maybe not restricting variable values as much as it could have but acceptable. Not bad for something that takes humans a few days in a TLA+ workshop to figure out how to do reliably!
It understands the high-level approach of how to write proofs of the form Spec => []Invariant. It will output a basic skeleton of this proof, more or less accurately.
It really struggles a lot with actual proof language syntax (relatable tbh lol), especially when to use terminal vs. non-terminal proofs, when to use SUFFICES/ASSUME/PROVE and how that affects subsequent proof steps, when to use CASE etc.
It struggles a lot with correct comma usage in BY/DEF terminal proofs, often having too many commas or too few commas between elements. In retrospect I should have focused less on fixing this very minor issue, burned up too much token/memory window on it when I should have focused on other things.
It does quite well on proofs with 3 or fewer levels. If you zoom in on a specific subproof it can usually handle it, figuring out which definitions it needs to expand in BY/DEF terminal proofs and such. But then if you ask it to print out the entire proof it will struggle to synthesize all the subproofs.
You do need to keep an eye on your token/memory window. If you start out by giving it the spec and then asking it to write a proof and correct it as you go, the spec will eventually fall outside the window and it can only rely on parroting/modifying the proof itself rather than reasoning about the proof in relation to the spec.

All in all, it shows great promise for basic TLA+ but will probably wait another six months before trying it out on proofs again. Ideally the experience would be that ChatGPT version X could teach me TLA+ proofs, rather than vice-versa.

Really what's happening is that it feeds the entire preceding conversation that fits inside the token/memory window into the inference engine for every token, which is how it seems to learn as you explain things to it. If you switch to a fresh chat instance it will have "forgotten" everything you explained. To ensure I would be able to easily rehydrate its understanding I asked it to summarize what it learned so it could easily re-acquire that knowledge; here's a prompt you can paste into your own instance of ChatGPT4 if you want to try it:

Here is a list of rules about TLA+ proofs that will augment your current understanding; you generated this list after a long conversation, with the intention that a fresh version of yourself could read this list and get the essence of the conversation:
- Use SUFFICES ASSUME ... PROVE ... OBVIOUS to make assumptions available to all subsequent proof steps in that level or subproofs of those steps.
- Terminal proofs can either be the keyword OBVIOUS or a BY ... DEF ... construct.
- In a terminal proof, BY ... DEF ... means the prover expects the elements before DEF to be proof steps or theorems and the elements after DEF to be definitions which must be expanded so the prover can look at their structure to prove the step correct.
- Theorems should be located between BY and DEF, not after DEF.
- Never put an extra or unnecessary comma in a BY ... DEF terminal proof.
- CASE statements can be used to consider different conditions within the proof.
- All non-terminal proofs that consist of sequences of steps must have a QED step as the last one.
- When using the Proof of Temporal Logic (PTL) theorem, it should be placed between BY and DEF in a proof step.

Andrew Helwer