Thanks for the quick response. The default command looks something like this:
java -cp $(TLATOOLS_JAR) -XX:+UseParallelGC -Xmx24G tlc2.TLC -config FOO.cfg -workers 8 -difftrace FOO.tla | tee foo.out
On the bigger machines I started with `workers` at 48 and 64 (on the 48-thread and the 64-thread machine respectively), then tried 24, 16, and 8 for both. Ultimately, 8 got the highest throughput.
For 'simulation', the command looks similar:
java -cp tla2tools.jar -XX:+UseParallelGC tlc2.TLC -config FOO.cfg -workers 48 -simulate -depth 200 -difftrace FOO.tla
I tried 'simulation' mode with 2 workers, and with htop saw two threads each taking about 60% of one cpu; I tried with 4, and two threads got about 40% each, and the other two got about 20% each (still totaling 120%). So probably there's a severe bottleneck somewhere.
ATM on the 64-core box I have 60 different simulators with 2 workers each going, and that seems to be lighting things up; but it's a lot harder to monitor the results.