So I got my model to a point where it catches a lot of the
things I want to catch. Modeling a single "slot" and two
processes (never mind what that means) takes about 1 minute 20
seconds. Making it 2 slots and 3 processes has so far taken
my 8-thread workstation about 12 hours; and to be sure the
model is race-free, I really want to run 3 slots and 3
processes.
So I started looking into how to increase throughput.
Among the bits of kit we have lying around for testing is a
slightly older 48-thread box, and a more recent 400+ thread
box (but Linux only knows how to use 64 of them).
Unfortunately, I'm having trouble getting tlc to scale up
to using these systems. Basically, if I go past 8 worker
threads, the number of states generated actually goes down.
And since the maximum clock rate of my desktop is actually
higher than the maximum clock rate of the two massive-cpu
boxen, the total throughput on the big boxes is lower than on
my workstation.
I've also tried running in -simluation mode. Allegedly
this has been implemented
(
https://github.com/tlaplus/tlaplus/issues/147); but running
with 24 worker threads, the java process never takes more than
120% (i.e., no more than one worker is actually active).
It's more or less the same for two different JVMs:
CentOS's OpenJDK 11.0.4, and Debian's OpenJDK 1.8.0_171
(from jessie-backports).
Any suggestisons? Am I missing something obvious?
Thanks,
-George
--