[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlaplus] Collision probability of TLC state computation

Hi All,
I have some difficulty understanding the Collision Probability of TLC.

Let me explain my understanding.
The generated hash-code of a state is a 64 Bit number, i.e. there are n=2^64 different hash-codes possible.
Imagine a spec consists of k different states, e.g. numbered from 1 to k.
The hash-code is just a mapping from 1..k to 1..n. There are n^k such mappings.

We have no collision if the mapping is injective, i.e. different states are mapped to different hash-codes.
There are n!/(n-k)! such injective mappings.

Thus, for a random choice having no collision the probability is
   n!/((n-k)!*n^k) = n/n * (n-1)/n * ...* (n-k+1)/n = 1 * (1-1/n) * ...* (1-(k-1)/n)

To compare this with a constant probability c we take the natural logarithm of the equality yielding
   ln(1) + ln(1-1/n) + ...+ ln(1-(k-1)/n) = ln(c)

The left sum can be approximated by the tangent of ln(x) at 1, which is x-1. This gives an arithmetic sequence:
   0 -1/n - ... -(k-1)/n = - 1/n * (1 + 2 ... + (k-1)) = k*(1-k)/(2*n) <= ln(c)
The last unequality follows from ln being convex to below, thus the tangent is above the graph.

If we assume equality, we have a quadratic equation in k with the positive solution
   k1 = 1/2 + sqrt(1/4 - 2*n*ln(c)).

We can use perl to compute k for a collision probability of about 1% (thus c=99/100):

perl -le 'print 0.5 + sqrt(.25 - 2**65 *log(99/100))'  

This yields

This is just above 600 Millions.
Is this sufficient for large specs?
Is there a flaw in my argumentation?


You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/f205d9c4-34ab-49ca-9696-7bc1b0d836a5n%40googlegroups.com.