Diagnosing TLC Performance Issues

Hi Everyone,

I'm currently teaching a class of people how to use TLA+. One thing we ran into is that TLC has poor performance on some machines. Running

VARIABLE x

Init ==  x = 1

Next == UNCHANGED x

Spec == Init /\ [][Next]_x

Will take either 0 seconds or around 10 seconds across computers. Times are consistent: if a computer takes 0 seconds, it takes 0 seconds on every run. I had everybody send me TLC logs and the only obvious difference I noticed is that the fast machines all had 12 cores, while the slow machines all had 8 cores. However, reducing the number of workers to 1 does not reduce the performance on the 12 core machines. So I don't think that's it, and I've hit a dead end.

I don't think I have enough data to diagnose the likely issues or file an informative bug report, so my goal here is to find out what else I should be looking for. Since these are other people's computers I can't do anything complicated or test stuff for more than five minutes at a time or so. The workshop goes the whole week, though, so I can try a few different short things. How can I get more data on what could be wrong here?

Some notes I have so far:

Doesn't seem OS specific: the two fast runs are on Linux and Mac, and the slow runs Mac or Windows.
Doesn't seem to be JVM startup related. TLC tells me the slow machines are evaluating fewer states per second.
I got all logs by running clearing the TLA+ toolbox console and running TLC once.
Switching from breadth-first to depth-first seems to fix the slowdown.
Slowdown seems to be a constant offset: I tried a few other (small) specs and the difference was always around 10 seconds. So if the fast computers did a spec in about 2 seconds, the slow computers would take about 12 seconds.