[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] Re: Scaling up model checking



On 13.11.19 17:44, william.schultz via tlaplus wrote:
> To follow up on this, I am observing some apparent scalability issues
> with TLC when running on a single machine. I am using an
> EC2 c5d.24xlarge instance type, which has 96 cores and 192GB of memory.
> When I run a particular model with 64 workers, I see the following:
> 
> ubuntu@ip-172-31-1-228:/data/tla$ java -XX:MaxDirectMemorySize=120000m
> -Xmx120185m -Dtlc2.tool.fp.FPSet.impl=tlc2.tool.fp.OffHeapDiskFPSet
> -XX:+UseParallelGC -cp tla2tools.jar tlc2.TLC -workers 64 -config mc.cfg
> mc.tla
> TLC2 Version 2.14 of 10 July 2019 (rev: 0cae24f)
> Running breadth-first search Model-Checking with fp 117 and seed
> 1389448420162930971 with 64 workers on 96 cores with 106832MB heap and
> 120000MB offheap memory [pid: 7730] (Linux 4.15.0-1051-aws amd64, Ubuntu
> 11.0.4 x86_64, OffHeapDiskFPSet, DiskStateQueue).
> Parsing file /data/tla/mc.tla
> Parsing file /tmp/Naturals.tla
> Parsing file /tmp/Integers.tla
> Parsing file /tmp/FiniteSets.tla
> Parsing file /tmp/Sequences.tla
> Parsing file /tmp/TLC.tla
> Semantic processing of module Naturals
> Semantic processing of module Integers
> Semantic processing of module Sequences
> Semantic processing of module FiniteSets
> Semantic processing of module TLC
> Semantic processing of module mc
> Starting... (2019-11-14 01:05:32)
> Computing initial states...
> Finished computing initial states: 1 distinct state generated at
> 2019-11-14 01:05:43.
> Progress(6) at 2019-11-14 01:05:46: 147,528 states generated (147,528
> s/min), 9,917 distinct states found (9,917 ds/min), 8,452 states left on
> queue.
> Progress(9) at 2019-11-14 01:06:46: 4,363,650 states generated
> (4,216,122 s/min), 382,868 distinct states found (372,951 ds/min),
> 290,240 states left on queue.
> Progress(9) at 2019-11-14 01:07:46: 8,448,010 states generated
> (4,084,360 s/min), 731,818 distinct states found (348,950 ds/min),
> 533,823 states left on queue.
> Progress(10) at 2019-11-14 01:08:46: 12,452,538 states generated
> (4,004,528 s/min), 1,093,250 distinct states found (361,432 ds/min),
> 790,535 states left on queue.
> Progress(10) at 2019-11-14 01:09:46: 16,336,391 states generated
> (3,883,853 s/min), 1,436,130 distinct states found (342,880 ds/min),
> 1,044,868 states left on queue.
> Progress(10) at 2019-11-14 01:10:46: 20,274,326 states generated
> (3,937,935 s/min), 1,787,099 distinct states found (350,969 ds/min),
> 1,295,292 states left on queue.
> Progress(10) at 2019-11-14 01:11:46: 24,372,652 states generated
> (4,098,326 s/min), 2,177,834 distinct states found (390,735 ds/min),
> 1,556,071 states left on queue.
> Progress(10) at 2019-11-14 01:12:46: 28,343,121 states generated
> (3,970,469 s/min), 2,552,098 distinct states found (374,264 ds/min),
> 1,815,312 states left on queue.
> Progress(10) at 2019-11-14 01:13:46: 32,267,652 states generated
> (3,924,531 s/min), 2,902,935 distinct states found (350,837 ds/min),
> 2,057,542 states left on queue.
> Progress(10) at 2019-11-14 01:14:46: 36,234,857 states generated
> (3,967,205 s/min), 3,282,234 distinct states found (379,299 ds/min),
> 2,314,932 states left on queue.
> Progress(11) at 2019-11-14 01:15:46: 40,186,365 states generated
> (3,951,508 s/min), 3,679,592 distinct states found (397,358 ds/min),
> 2,582,686 states left on queue.
> Progress(11) at 2019-11-14 01:16:46: 44,118,806 states generated
> (3,932,441 s/min), 4,034,988 distinct states found (355,396 ds/min),
> 2,825,454 states left on queue.
> 
> The throughput (s/min) appears to top out at around 3.9 million
> states/minute. These are the reported numbers when I run with just 8
> workers, on the same machine:
> 
> ubuntu@ip-172-31-1-228:/data/tla$ java -XX:MaxDirectMemorySize=120000m
> -Xmx120185m -Dtlc2.tool.fp.FPSet.impl=tlc2.tool.fp.OffHeapDiskFPSet
> -XX:+UseParallelGC -cp tla2tools.jar c2.TLC -workers 8 -config mc.cfg mc.tla
> TLC2 Version 2.14 of 10 July 2019 (rev: 0cae24f)
> Running breadth-first search Model-Checking with fp 43 and seed
> 6871599717738624516 with 8 workers on 96 cores with 106832MB heap and
> 120000MB offheap memory [pid: 8128] (Linux 4.15.0-1051-aws amd64, Ubuntu
> 11.0.4 x86_64, OffHeapDiskFPSet, DiskStateQueue).
> Parsing file /data/tla/mc.tla
> Parsing file /tmp/Naturals.tla
> Parsing file /tmp/Integers.tla
> Parsing file /tmp/FiniteSets.tla
> Parsing file /tmp/Sequences.tla
> Parsing file /tmp/TLC.tla
> Semantic processing of module Naturals
> Semantic processing of module Integers
> Semantic processing of module Sequences
> Semantic processing of module FiniteSets
> Semantic processing of module TLC
> Semantic processing of module mc
> Starting... (2019-11-14 01:17:58)
> Computing initial states...
> Finished computing initial states: 1 distinct state generated at
> 2019-11-14 01:18:07.
> Progress(6) at 2019-11-14 01:18:10: 136,991 states generated (136,991
> s/min), 8,542 distinct states found (8,542 ds/min), 7,332 states left on
> queue.
> Progress(8) at 2019-11-14 01:19:10: 3,344,141 states generated
> (3,207,150 s/min), 295,218 distinct states found (286,676 ds/min),
> 229,350 states left on queue.
> Progress(9) at 2019-11-14 01:20:10: 6,591,286 states generated
> (3,247,145 s/min), 572,490 distinct states found (277,272 ds/min),
> 430,834 states left on queue.
> Progress(9) at 2019-11-14 01:21:10: 9,852,114 states generated
> (3,260,828 s/min), 846,787 distinct states found (274,297 ds/min),
> 621,471 states left on queue.
> Progress(9) at 2019-11-14 01:22:10: 12,984,599 states generated
> (3,132,485 s/min), 1,124,563 distinct states found (277,776 ds/min),
> 827,090 states left on queue.
> Progress(9) at 2019-11-14 01:23:10: 16,140,694 states generated
> (3,156,095 s/min), 1,398,002 distinct states found (273,439 ds/min),
> 1,030,079 states left on queue.
> Progress(10) at 2019-11-14 01:24:10: 19,356,420 states generated
> (3,215,726 s/min), 1,688,862 distinct states found (290,860 ds/min),
> 1,228,268 states left on queue.
> Progress(10) at 2019-11-14 01:25:10: 22,544,906 states generated
> (3,188,486 s/min), 1,981,062 distinct states found (292,200 ds/min),
> 1,428,619 states left on queue.
> Progress(10) at 2019-11-14 01:26:10: 25,777,526 states generated
> (3,232,620 s/min), 2,284,525 distinct states found (303,463 ds/min),
> 1,633,791 states left on queue.
> Progress(10) at 2019-11-14 01:27:10: 28,980,735 states generated
> (3,203,209 s/min), 2,577,732 distinct states found (293,207 ds/min),
> 1,831,318 states left on queue.
> Progress(10) at 2019-11-14 01:28:10: 32,199,852 states generated
> (3,219,117 s/min), 2,872,788 distinct states found (295,056 ds/min),
> 2,033,065 states left on queue.
> 
> which shows a max throughput of around 3.2 million states/minute. This
> would imply that TLC is scaling very badly i.e. a scale factor of
> (3.9/3.2) = 1.21x when there are 8 times as many cores. I don't think
> TLC should be disk bound yet since it has a ton of memory (120GB)
> allocated to it and I can observe nearly maximal core utilization while
> executing these runs. Is there something obvious I may be overlooking here?



Hi Will,

the current implementation of the queue of unexplored states
(StateQueue) does not scale well.  PageQueue [1] is supposed to become
its replacement but it is still in its early stages.

Best
Markus

[1] https://github.com/lemmy/PageQueue


-- 
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/0f218cca-c756-4565-dbb8-452d75dbc752%40lemmster.de.