[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[tlaplus] Collision probability of TLC state computation



Hi All,
I have some difficulty understanding the Collision Probability of TLC.

Let me explain my understanding.
The generated hash-code of a state is a 64 Bit number, i.e. there are n=2^64 different hash-codes possible.
Imagine a spec consists of k different states, e.g. numbered from 1 to k.
The hash-code is just a mapping from 1..k to 1..n. There are n^k such mappings.

We have no collision if the mapping is injective, i.e. different states are mapped to different hash-codes.
There are n!/(n-k)! such injective mappings.

Thus, for a random choice having no collision the probability is
   n!/((n-k)!*n^k) = n/n * (n-1)/n * ...* (n-k+1)/n = 1 * (1-1/n) * ...* (1-(k-1)/n)

To compare this with a constant probability c we take the natural logarithm of the equality yielding
   ln(1) + ln(1-1/n) + ...+ ln(1-(k-1)/n) = ln(c)

The left sum can be approximated by the tangent of ln(x) at 1, which is x-1. This gives an arithmetic sequence:
   0 -1/n - ... -(k-1)/n = - 1/n * (1 + 2 ... + (k-1)) = k*(1-k)/(2*n) <= ln(c)
The last unequality follows from ln being convex to below, thus the tangent is above the graph.

If we assume equality, we have a quadratic equation in k with the positive solution
   k1 = 1/2 + sqrt(1/4 - 2*n*ln(c)).

We can use perl to compute k for a collision probability of about 1% (thus c=99/100):


perl -le 'print 0.5 + sqrt(.25 - 2**65 *log(99/100))'  

This yields
608_926_881.733485

This is just above 600 Millions.
Is this sufficient for large specs?
Is there a flaw in my argumentation?


Best,
Siegfried

--
You received this message because you are subscribed to the Google Groups "tlaplus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tlaplus+unsubscribe@xxxxxxxxxxxxxxxx.
To view this discussion on the web visit https://groups.google.com/d/msgid/tlaplus/f205d9c4-34ab-49ca-9696-7bc1b0d836a5n%40googlegroups.com.