Hashtable collisions and the "birthday paradox"
-
Suppose there are 365 slots in the hash table: M=365
-
What is the probability that there will be a collision when inserting N keys?
-
For N = 10, probN,M(collision) = 12%
-
For N = 20, probN,M(collision) = 41%
-
For N = 30, probN,M(collision) = 71%
-
For N = 40, probN,M(collision) = 89%
-
For N = 50, probN,M(collision) = 97%
-
For N = 60, probN,M(collision) = 99+%
-
So, among 60 randomly selected people, it is almost certain that at least one pair of them have the same birthday
-
And, on average one pair of people will share a birthday in a group of about
people
-
In general: collisions are likely to happen, unless the hash table is quite sparsely filled
-
So, if you want to use hashing, can’t use perfect hashing because you don’t know the keys in advance, and don’t want to waste huge amounts of storage space, you have to have a strategy for dealing with collisions
CONTENTS PREVIOUS NEXT