# Re: [tlaplus] Fingerprint collision probability?

```On 22.02.19 08:28, Sean McCauliff wrote:
> Fingerprint collision probability is displayed in the model checking
> results, but I haven't found a good explanation about what this means
> other than it's used to mark already explored states. Is it something I
> should worry about?  Is there a document somewhere that describes the
> collision probability?

Hi Sean,

explicit state model checking (with TLC) amounts to depth-first or
breadth-first search over the (directed) state graph described by the
state machine encoded as a TLA+ specification.  For the search to
terminate (assuming a finite model), TLC maintains a set S of already
explored states.  For efficiency reasons, S does not contain the actual
states but state hashes (fingerprints).  In other words, TLC computes
the hash value h of each state in the state graph and adds h to S unless
h is already in S.  If h is in S, the corresponding state is assumed to
be fully explored (see  for a PlusCal spec of TLC's safety checking
algorithm).

The assumption above can be wrong because TLC maps a state of arbitrary
size to the fixed size h (represented by a 64 bit integer).  This can
lead to hash collisions such that different states map to the same h.
The larger the state graph, the higher is the probability of hash
collisions.  The probability shown by TLC is exactly this; the
probability of a hash collision for the current search over the state
graph.  Note that if hash collisions occur, the search might fail to
explore the complete state graph which in turn can cause TLC to miss
violations of the safety and liveness properties.

TLC reports two values for the probability of a hash collision:  The
"calculated/optimistic" probability and the "observed" probability.

The formula to calculate "optimistic" is (d * ((g - d) / 2^64) where d
is the the cardinality of S and g the order of the state graph.  Both d
and g are reported by TLC as "distinct states" and "generated states".

The "observed" probability is defined as the "closest pair" of hashes in S.

Deciding if the hash collision probability is sufficiently low requires
engineering judgment.  If you have reason to believe that TLC ran into
hash collisions and want to further reduce the probability, you can have
TLC check your spec multiple times and select a different hash function
seed (see TLC's "-fp num" parameter ).  Note that in the upcoming
release TLC will automatically select a different seed for each run .

Thanks
Markus

 https://gist.github.com/lemmy/32ffa17ef471cb7f6dd42fe89c8a230c

https://github.com/tlaplus/tlaplus/blob/d62e9f46ffa05803ec223cf78b0991c3b9fa785e/tlatools/src/tlc2/TLC.java#L178-L179
 https://github.com/tlaplus/tlaplus/issues/212

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To post to this group, send email to tlaplus@xxxxxxxxxxxxxxxx.