Re: [tlaplus] Scaling up model checking

On Tuesday, August 27, 2019 at 5:40:53 PM UTC+1, Markus Alexander Kuppe wrote:

On 27.08.19 09:37, dun...@xxxxxxxxx wrote:
> So I got my model to a point where it catches a lot of the things I want
> to catch. Modeling a single "slot" and two processes (never mind what
> that means) takes about 1 minute 20 seconds. Making it 2 slots and 3
> processes has so far taken my 8-thread workstation about 12 hours; and
> to be sure the model is race-free, I really want to run 3 slots and 3
> processes.
>
> So I started looking into how to increase throughput. Among the bits of
> kit we have lying around for testing is a slightly older 48-thread box,
> and a more recent 400+ thread box (but Linux only knows how to use 64 of
> them).
>
> Unfortunately, I'm having trouble getting tlc to scale up to using these
> systems. Basically, if I go past 8 worker threads, the number of states
> generated actually goes down. And since the maximum clock rate of my
> desktop is actually higher than the maximum clock rate of the two
> massive-cpu boxen, the total throughput on the big boxes is lower than
> on my workstation.
>
> I've also tried running in -simluation mode. Allegedly this has been
> implemented (https://github.com/tlaplus/tlaplus/issues/147); but running
> with 24 worker threads, the java process never takes more than 120%
> (i.e., no more than one worker is actually active).
>
> It's more or less the same for two different JVMs: CentOS's OpenJDK
> 11.0.4, and Debian's OpenJDK 1.8.0_171 (from jessie-backports).
>
> Any suggestisons? Am I missing something obvious?
>

Hi George,

please share the command-line parameters you run TLC with.

Thanks for the quick response. The default command looks something like this:

java -cp $(TLATOOLS_JAR) -XX:+UseParallelGC -Xmx24G tlc2.TLC -config FOO.cfg -workers 8 -difftrace FOO.tla | tee foo.out

On the bigger machines I started with `workers` at 48 and 64 (on the 48-thread and the 64-thread machine respectively), then tried 24, 16, and 8 for both. Ultimately, 8 got the highest throughput.

For 'simulation', the command looks similar:

java -cp tla2tools.jar -XX:+UseParallelGC tlc2.TLC -config FOO.cfg -workers 48 -simulate -depth 200 -difftrace FOO.tla

I tried 'simulation' mode with 2 workers, and with htop saw two threads each taking about 60% of one cpu; I tried with 4, and two threads got about 40% each, and the other two got about 20% each (still totaling 120%). So probably there's a severe bottleneck somewhere.

ATM on the 64-core box I have 60 different simulators with 2 workers each going, and that seems to be lighting things up; but it's a lot harder to monitor the results.

-George