[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Distributed TLC: Failed to checkpoint the fingerprint server



Hello!

Running a nightly from last week, I found that the distributed (ad-hoc)
model check I left running for the night has hanged. The toolbox is
still in "Computing reachable states" mode, the servers seem to be up
and have output no errors. Servers still output progress messages about
the number of fingerprints stored, which do not increase. The TLC java
process is still present on the toolbox machine. The slave servers as
well as the toolbox machine seem to be idle (no CPU, disk or network
usage) and just sit on the heap of memory allocated for them. The MC.out
file contains:

@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:23:05: 2754111084 states generated
(1,440,989 s/min), 452951326 distinct states found (215,859 ds/min),
87756011 states left on queue.
@!@!@ENDMSG 2200 @!@!@
@!@!@STARTMSG 2195:0 @!@!@
Checkpointing of run -- Checkpointing of run 15-12-02-13-53-09/ compl
@!@!@ENDMSG 2195 @!@!@
Error: Failed to checkpoint the fingerprint server at s3. This server
might be down.
@!@!@STARTMSG 2196:0 @!@!@
Checkpointing completed at (2015-12-09 06:26:04)
@!@!@ENDMSG 2196 @!@!@
@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:26:04: 2758088637 states generated
(3,977,553 s/min), 453573098 distinct states found (621,772 ds/min),
88271243 states left on queue.
@!@!@ENDMSG 2200 @!@!@
@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:27:04: 2760215909 states generated
(2,127,272 s/min), 453682938 distinct states found (109,840 ds/min),
87607691 states left on queue.
@!@!@ENDMSG 2200 @!@!@
@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:28:04: 2761764541 states generated
(1,548,632 s/min), 453682938 distinct states found (0 ds/min), 87313205
states left on queue.
@!@!@ENDMSG 2200 @!@!@
@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:29:04: 2761764541 states generated (0
s/min), 453682938 distinct states found (0 ds/min), 87313205 states left
on queue.
@!@!@ENDMSG 2200 @!@!@
@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:30:04: 2761764541 states generated (0
s/min), 453682938 distinct states found (0 ds/min), 87313205 states left
on queue.
@!@!@ENDMSG 2200 @!@!@
@!@!@STARTMSG 2200:0 @!@!@
Progress(90) at 2015-12-09 06:31:04: 2761764541 states generated (0
s/min), 453682938 distinct states found (0 ds/min), 87313205 states left
on queue.
@!@!@ENDMSG 2200 @!@!@

...and so on.

I stopped TLC from the toolbox and hit CTRL+C for the servers. I hope
that doesn't invalidate the model checking results?

Best regards,
Jaak