[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [tlaplus] TLA+ model of OneThirdRule algorithm from "A Reduction Theorem" paper



Hello Stephan!

I am sorry not being precise enough to clearly communicate my idea.
Not to let you think of my previous post as of a complete nonsense I will try to describe the idea with greater clarity.

Just to put as on the same page, as I understand the HO-model among other things gives us:
* an ability to abstract away process crashes and message losses, concentrating on control flow of an algorithm using only
HO(p,r), round number and received messages set as a parameter.
* specify communication predicate under which an algorithm is going to operate correctly (in both safety and liveness sense)
  This predicate is defined on HO(p,r) and fixed number of participants N.

Number N is not changing during system lifetime. If some process dies it will never be put into any of "future" HO sets, but N stays the same.
An algorithm in the HO-model is expressed in such a way that having a 'HO' set on a given round is sufficient to make a control decision.

This HO-set abstraction removes distinction between process crashes and late messages, and generally seems like helpful.
But there are some cases when one might wish to have alternative models:

* By "squeezing" node failures, message loses and trivially process silence phenomena into HO-set on a given round we actually reducing a scope of "HO-compatible" algorithms.

For OneThirdRule it is enough to receive replies from 2*N/3 and you do not care about all the others.
But at least for some distributed mutex algorithms to make them fault-tolerant it is not the case since each process have to know why someone is keeping silence:
is it because quiet process still executes critical section or because it fail-stopped?

* An algorithm expressed in HO is not able to adapt to decreasing number of alive participants N.

For OneThirdRule to reach a consensus we have to have not less than 2*N/3 replies. In some moment set of alive processes can become smaller than that, and
then the algorithm will not be able to reach a consensus despite the fact that actually it is achievable if only N could be changed "dynamically". We have kind of
N/3-fault-tolerance, but this fault-tolerance is not up to theoretically achievable (I guess to reach a consensus we need at least 3 alive nodes, right?)

To overcome stated difficulties, one might want to consider some different round-based models.

One might try to extend HO model with N(p,r), the number of processes for which p have not received a failure notice from reliable fault detector (RFD).
So we take a move from HO(p,r) to {HO(p,r), N(p,r)}. Henceforth N(p,r) is non-increasing function. Also note that it is definitely possible to have |HO(p,r)| <= N(p,r).

If N=const then we have a standard HO. If N=f(p,r) then hopefully such model will give more expressive power
(having a chance to express distributed mutual exclusion, for example? It is just what I came to at the moment, but I believe there are more species to consider),

One could argue that proposed hypothetical model is just an RRFD, but I do not agree, because:
 * using reliable fault detector instead of unreliable makes this model more practical, since, as far as I know,
FDs used in real systems are trying to mimic RFD semantics
 * D(p,r) is a set, so algorithm is obliged to work with particular process identifiers while proposed model abstracts it away.
 * N(p,r) is non-increasing for all p,r, while D(p,r) could be increasing on some interval, possibly making verification harder.

Considering that we squeezed D(p,r) into a natural number N(p,r) and that fault detector is reliable there is a hope that an algorithm expressed in such a model
could be model-checked/verified rather efficiently by similar reduction principle (we have to prove corresponding 'extended' Reduction Theorem).

Do you see any fallacies in my reasoning? :-)


среда, 19 апреля 2017 г., 19:31:42 UTC+3 пользователь Stephan Merz написал:
Hello again, Evgeniy,

On 19 Apr 2017, at 11:24, Evgeniy Shishkin <evgen...@xxxxxxxxx> wrote:

Hello Stephan,

Thanks so much for your clarification! After excluding heardof from state view I got exactly those results stated in the table.

> Anyway, we only included these experiments in order to demonstrate the interest of the theorem shown earlier in the paper
Yeah.
As far as I understand, the main idea of the paper is that an algorithm expressed in HO model can be model-checked with respect to 
'local property' in coarse-grained model without loosing any meaningful states compared to fine-grained model.

Exactly (but the theorem is independent of the verification technology and applies just as well to model checking as to deductive verification, where it leads to simpler proofs).

Not sure if this forum is a right place for such a side note but anyway:

As far as I understand HO model by some reasons is especially well-suited for consensus algorithms, but as for some mutual exclusion algorithms it is not so.

Indeed, the Heard-Of computational model is intended for representing fault-tolerant distributed algorithms, it will not be appropriate for mutual exclusion algorithms.

I was wondering if HO model can be extended with N, number of alive processes as seen by environment fault detector on the current round, as a parameter giving
more expressive power to the model. 

I don't understand this: first, HO has no notion of fault detector, but of course the Heard-Of set HO(p,r) can be seen as the complement of the set D(p,r) of faulty processes signaled to process p in round r in the RRFD model by Gafni. Thus, the number of processes that p considers alive in round r is precisely the cardinality of HO(p,r), so "extending" the model by this information would not add any information?


It may resemble RRFD by Gafni, but actually it still abstracts concrete set of suspected/alive processes into just a number. 
Potentially this could lead to a bigger class of algorithms checkable/provable in a reasonable time with a help of (extended) reduction theorem.
It is then takes its place somewhere in the middle between rather concrete RRFD and rather abstract HO.

Maybe you have investigated something similar towards this line of thought?

No, I haven't. We restricted attention to the HO model of algorithms.

Regards,
Stephan



вторник, 18 апреля 2017 г., 19:59:11 UTC+3 пользователь Stephan Merz написал:
Hi Evgeniy,

thanks for your interest in this paper. The modules that I used are attached to this message, and I think they are equivalent to what you have, modulo comments, renaming, and some auxiliary definitions. The difference in performance comes from the fact that I used a VIEW specification so that TLC identifies any two states that only differ in the value of the auxiliary (history) variable heardof, as indicated in the last sentence of section 4 of the paper (p.13): this variable is useful essentially for interpreting counter-examples displayed by TLC but does not affect the behavior of the algorithm. In particular, it is not referred to by any predicate that is used in the specification. I am sorry if the explanation/hint in the paper is a little cryptic.

Anyway, we only included these experiments in order to demonstrate the interest of the theorem shown earlier in the paper: they are extremely naive as far as model checking fault-tolerant algorithms goes. If you really want to learn about model checking this kind of algorithm, I suggest that you look at the recent work by Igor Konnov et al. [1].

Best regards,

Stephan



-- 
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+u...@googlegroups.com.
To post to this group, send email to tla...@xxxxxxxxxxxxxxxx.
Visit this group at https://groups.google.com/group/tlaplus.
For more options, visit https://groups.google.com/d/optout.