[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Some distributed TLC errors not displayed in TLA+ Toolbox



Hello!

I'm running TLC (2.07) in distributed ad hoc mode from the TLA+ Toolbox
(1.5.1) and I encountered the following issue. In some cases the TLC
master process run from the TLA+ Toolbox allocates a lot of memory and
starts swapping. I run the FPSets on slave nodes (using
TLCWorkerAndFPSet) so I have no idea why the master sometimes uses more
than 8G of memory. Creating and model-checking a new identical model
might be a workaround (tried once).

Anyway when the master starts swapping then some slave threads time out
during reads in "JRMP connection establishment" with
java.net.SocketTimeoutException. However, the TLA+ Toolbox eventually
displays "Errors detected: No errors" even though model checking has not
successfully been completed. Sometimes one can see TLC printouts on the
slave terminals AFTER the TLA+ Toolbox has displayed a finished/success
status to the user.


Best Regards,

Jaak Ristioja
Cybernetica AS