States Time Diameter Found Distinct Queue 00:00:00 0 2 2 2 00:00:03 12 373 773 80 532 39 897 00:01:03 17 11 409 991 1 730 911 650 391 00:02:03 19 22 763 536 3 230 184 1 123 685 00:03:03 20 34 124 771 4 653 357 1 528 520 00:04:03 20 45 515 221 6 030 076 1 914 174 00:05:03 21 56 993 967 7 381 016 2 261 254 00:06:03 21 68 303 213 8 686 644 2 578 860 00:07:03 22 79 637 963 9 979 068 2 898 506 00:08:03 22 90 914 422 11 224 929 3 143 886 00:09:03 22 102 262 975 12 496 690 3 455 732 00:10:03 23 113 723 997 13 740 892 3 727 154 00:11:03 23 125 147 018 14 962 231 3 937 627 00:12:03 23 136 675 178 16 200 961 4 210 156 00:13:03 24 148 115 992 17 420 927 4 427 002 00:14:03 24 159 595 885 18 620 234 4 683 289 00:15:03 24 170 985 420 19 792 862 4 850 702 00:16:03 24 182 526 842 20 995 286 5 107 492 00:17:03 24 193 700 817 22 153 555 5 298 514 00:18:03 25 204 984 084 23 298 651 5 497 201 00:19:03 25 216 009 397 24 400 438 5 656 809 00:20:03 25 227 106 833 25 515 676 5 812 490 00:21:03 25 238 359 175 26 644 930 6 013 833 00:22:03 25 249 569 969 27 766 444 6 186 311 00:23:03 25 260 734 338 28 874 577 6 290 077 00:24:03 26 272 001 237 29 985 680 6 516 761 00:25:03 26 283 153 368 31 061 666 6 629 888 00:26:03 26 294 410 128 32 160 080 6 766 310 00:27:03 26 305 672 779 33 253 834 6 924 320 00:28:03 26 316 922 323 34 347 480 7 087 609 00:29:03 26 328 063 646 35 430 080 7 206 904 00:30:03 27 339 293 311 36 501 951 7 316 872 00:31:03 27 350 641 518 37 573 848 7 462 407 00:32:03 27 361 762 024 38 622 782 7 556 111 00:33:03 27 372 957 115 39 681 670 7 664 655 00:34:03 27 384 230 947 40 739 338 7 778 788 00:35:03 27 395 493 403 41 801 105 7 921 614 00:36:03 27 406 745 835 42 866 056 8 021 215 00:37:03 27 417 922 316 43 914 700 8 086 017 00:38:03 28 429 227 513 44 952 149 8 217 793 00:39:03 28 440 408 053 45 990 326 8 321 943 00:40:03 28 451 599 765 47 007 672 8 372 374 00:41:03 28 462 824 574 48 036 880 8 443 463 00:42:03 28 473 936 478 49 042 645 8 517 575 00:43:03 28 484 604 347 50 031 004 8 646 419 00:44:03 28 495 524 420 51 029 013 8 725 648 ... Coverage Module Action Total Distinct KafkaRoller Init 2 2 KafkaRoller ObserveNode 206 922 950 15 340 974 KafkaRoller ReconfigureNode 35 128 411 8 280 KafkaRoller NodeServing 87 000 108 8 282 KafkaRoller RetryNode 155 553 002 34 673 480 Seems it's working somehow :D. Very excited with first real model...[1]. But it's funny how like 20 minutes could generate 20GB of data and it's increasing very quickly. [1] - https://github.com/see-quick/verification/blob/main/formal_verification/TLA%2B/strimzi/kafka_roller/KafkaRoller.tla
Hi Paul,
Thanks, for your answer. I think that's a solid piece of work, which could be used as a baseline. I appreciate that. The possible state changes are described in [1] under states. i.e.,
and thus I have refined the spec you have provided a little bit based on these rules and here is the implementation with detailed changes [2]. But this implementation reached a deadlock so I had to change a bit ObserveNode(n) action. And I am now in a problem where Invariant ActiveControllerNotRestarted is violated.
Here is a trace:
```Invariant ActiveControllerNotRestarted is violated.The behavior up to this point is:1: <Initial predicate>/\ activeController = "controller-0"/\ nodeState = ( "controller-0" :>[ state |-> "UNKNOWN",roles |-> {"controller"},restartAttempts |-> 0,retries |-> 0 ] @@"controller-1" :>[ state |-> "UNKNOWN",roles |-> {"controller"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-2" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-3" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-4" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] )/\ reconfiguringNodes = {}/\ restartingNodes = {}2: <ObserveNode line 60, col 3 to line 62, col 72 of module KafkaRoller>/\ activeController = "controller-0"/\ nodeState = ( "controller-0" :>[ state |-> "NOT_RUNNING",roles |-> {"controller"},restartAttempts |-> 0,retries |-> 0 ] @@"controller-1" :>[ state |-> "UNKNOWN",roles |-> {"controller"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-2" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-3" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-4" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] )/\ reconfiguringNodes = {}/\ restartingNodes = {}3: <RestartNode line 66, col 3 to line 72, col 55 of module KafkaRoller>/\ activeController = "controller-0"/\ nodeState = ( "controller-0" :>[ state |-> "RESTARTED",roles |-> {"controller"},restartAttempts |-> 1,retries |-> 0 ] @@"controller-1" :>[ state |-> "UNKNOWN",roles |-> {"controller"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-2" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-3" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] @@"broker-4" :>[ state |-> "UNKNOWN",roles |-> {"broker"},restartAttempts |-> 0,retries |-> 0 ] )/\ reconfiguringNodes = {}/\ restartingNodes = {"controller-0"}
[1] - https://github.com/strimzi/proposals/blob/9a777c320c56e3edd473f088a89864fa643ac63a/06x-new-kafka-roller.md#proposal
[2] - https://github.com/see-quick/verification/pull/3pi 26. 4. 2024 o 20:19 Paul Eipper <lkraider@xxxxxxxxx> napísal(a):Hello,Maybe this could be a first start on the definition rules:There is missing logic on the possible state changes and running this model will fail as there are no restraint rules on them, as I would need to understand the possible state changes and model them accordingly.So for example running this model:SPECIFICATION Spec
INVARIANTS
NodesWithinRestartLimit
NodesWithinRetryLimit
ActiveControllerNotRestarted
ControllerQuorumMaintained
PROPERTIES
AllNodesServing
NodesNotStuckRecovering
CONSTANTS
ControllerNodes = {"controller-0", "controller-1"}
BrokerNodes = {"broker-2", "broker-3", "broker-4"}
CombinedNodes = {}
MaxRestartAttempts = 3
MaxRetries = 5Will hit a failed state fairly quickly because there is no rule that a RESTART can't happen on initialization:/\ activeController = "controller-0"
/\ nodeState = ( "controller-0" :>
[ state |-> "RESTARTED",
roles |-> {"controller"},
restartAttempts |-> 1,
retries |-> 0 ] @@
"controller-1" :>
[ state |-> "UNKNOWN",
roles |-> {"controller"},
restartAttempts |-> 0,
retries |-> 0 ] @@
"broker-2" :>
[ state |-> "UNKNOWN",
roles |-> {"broker"},
restartAttempts |-> 0,
retries |-> 0 ] @@
"broker-3" :>
[ state |-> "UNKNOWN",
roles |-> {"broker"},
restartAttempts |-> 0,
retries |-> 0 ] @@
"broker-4" :>
[ state |-> "UNKNOWN",
roles |-> {"broker"},
restartAttempts |-> 0,
retries |-> 0 ] )
/\ reconfiguringNodes = {}
/\ restartingNodes = {"controller-0"}My suggestion on the first step on evolving this spec would be to fix the possible state transitions.Hope this helps.att,--
Paul Eipper--On Fri, Apr 26, 2024 at 7:34 AM Maroš Orsák <maros.orsak159@xxxxxxxxx> wrote:Hello,--
I am writing to seek assistance with developing a model for the KafkaRoller within the Strimzi open-source project, which I maintain. For context, Strimzi is a Cloud Native Computing Foundation (CNCF) project that simplifies the complexities of managing Apache Kafka, a distributed system. The project provides a suite of operators that facilitate various tasks, including updating Kafka clusters, scaling nodes, and integrating multiple components such as tracing and OAuth2, among others.
A critical aspect of managing Kafka clusters is determining when a Kafka node should be rolled. To address this, we have developed a KafkaRoller class. Recently, we proposed a new design featuring an advanced algorithm intended to enhance this functionality. To ensure the safety and correctness of this algorithm, I aim to create a model in TLA+ that will allow us to formally verify it using safety and liveness properties. These properties include but are not limited to, ensuring that no node exceeds maximum retries (safety), preventing the simultaneous restart of all controller nodes (safety), and guaranteeing that all nodes will eventually be in a SERVING state (liveness).
However, I am unfamiliar with how to properly construct such a model and specify these properties in TLA+. Moreover, I lack time to do such a thing...so any guidance or support you could provide would be greatly appreciated. Thank you.
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/CAMd-2QuocAToKY7HoorcvM8szpMSUu81UzgjRRs9gxROVkUhSg%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/CAO8R-xguX_TmFr2zQ%2BMvzkqUys-r5pv08SL%3DVXGT%3D-m5bnkYeA%40mail.gmail.com.