[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Scaling up model checking

On Tuesday, August 27, 2019 at 5:40:53 PM UTC+1, Markus Alexander Kuppe wrote:
On 27.08.19 09:37, dun...@xxxxxxxxx wrote:
> So I got my model to a point where it catches a lot of the things I want
> to catch.  Modeling a single "slot" and two processes (never mind what
> that means) takes about 1 minute 20 seconds.  Making it 2 slots and 3
> processes has so far taken my 8-thread workstation about 12 hours; and
> to be sure the model is race-free, I really want to run 3 slots and 3
> processes.
> So I started looking into how to increase throughput.  Among the bits of
> kit we have lying around for testing is a slightly older 48-thread box,
> and a more recent 400+ thread box (but Linux only knows how to use 64 of
> them).
> Unfortunately, I'm having trouble getting tlc to scale up to using these
> systems.  Basically, if I go past 8 worker threads, the number of states
> generated actually goes down.  And since the maximum clock rate of my
> desktop is actually higher than the maximum clock rate of the two
> massive-cpu boxen, the total throughput on the big boxes is lower than
> on my workstation.
> I've also tried running in -simluation mode.  Allegedly this has been
> implemented (https://github.com/tlaplus/tlaplus/issues/147); but running
> with 24 worker threads, the java process never takes more than 120%
> (i.e., no more than one worker is actually active).
> It's more or less the same for two different JVMs: CentOS's OpenJDK
> 11.0.4, and Debian's OpenJDK 1.8.0_171 (from jessie-backports).
> Any suggestisons? Am I missing something obvious?

Hi George,

please share the command-line parameters you run TLC with.

Thanks for the quick response.  The default command looks something like this:

    java -cp $(TLATOOLS_JAR) -XX:+UseParallelGC -Xmx24G tlc2.TLC -config FOO.cfg -workers 8 -difftrace FOO.tla | tee foo.out

On the bigger machines I started with `workers` at 48 and 64 (on the 48-thread and the 64-thread machine respectively), then tried 24, 16, and 8 for both.  Ultimately, 8 got the highest throughput.

For 'simulation', the command looks similar:

    java -cp tla2tools.jar -XX:+UseParallelGC tlc2.TLC -config FOO.cfg -workers 48 -simulate -depth 200 -difftrace FOO.tla

I tried 'simulation' mode with 2 workers, and with htop saw two threads each taking about 60% of one cpu; I tried with 4, and two threads got about 40% each, and the other two got about 20% each (still totaling 120%).  So probably there's a severe bottleneck somewhere.

ATM on the 64-core box I have 60 different simulators with 2 workers each going, and that seems to be lighting things up; but it's a lot harder to monitor the results.


You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/76c9ec44-4c3f-4d1d-b3f1-2edefc445074%40googlegroups.com.