Hi, and thank you for the great work!
I was wondering if there are any TLA+ datasets (coding problems, with problem statements, and their solutions); these could be used for a variety of things: by language models, or by benchmarks; for instance, Aider uses Exercism, but TLA+ is not supported for benchmarks. If we had that, I could for example run Aider against TLA+ exercises using different models, and get the results. I would be able to pick, based on the benchmarks, which models perform best for TLA+. Of course, I do understand that benchmarks do not really measure "objectively" how good an LLM is in TLA+, but at least, it gives an idea when picking up the tool/model. TLA+ could even have its own "independent" benchmark against different LLMs, and it could be shared with the community and people could run it against all sorts of LLMs and have results published.. I think the CommunityModules are already a good start for a dataset in that regard.
Also, I was thinking maybe we could have this great repo (https://github.com/tlaplus/awesome-tlaplus) have a dedicated section for LLM/AI and eventually point to the MCP server PR (I personally didn't know this was done until I saw it on LinkedIn).
Y.
I'm afraid I'm not eligible to participate in the challenge, but I've just contributed a TLA+ MCP server that integrates SANY and TLC into agentic workflows in VSCode, Cursor, or any other environment that supports MCPs. The PR [1] shows how to set up the server.
Markus
[1] https://github.com/tlaplus/vscode-tlaplus/pull/379
> On May 14, 2025, at 1:40 PM, Federico Ponzi <fpon...@xxxxxxxxx> wrote:
>
> Hi all,
>
> In reference to the challenge: https://foundation.tlapl.us/challenge/index.html what are you planning to work on? If you're not planning to partecipate, are there any tools you wish existed and wouldn’t mind sharing, in case someone else wants to implement them?