Hashtable collisions and the "birthday paradox"

Suppose there are 365 slots in the hash table: M=365

What is the probability that there will be a collision when inserting N keys?

For N = 10, probN,M(collision) = 12%

For N = 20, probN,M(collision) = 41%

For N = 30, probN,M(collision) = 71%

For N = 40, probN,M(collision) = 89%

For N = 50, probN,M(collision) = 97%

For N = 60, probN,M(collision) = 99+%

So, among 60 randomly selected people, it is almost certain that at least one pair of them have the same birthday

And, on average one pair of people will share a birthday in a group of about
people

In general: collisions are likely to happen, unless the hash table is quite sparsely filled

So, if you want to use hashing, can’t use perfect hashing because you don’t know the keys in advance, and don’t want to waste huge amounts of storage space, you have to have a strategy for dealing with collisions
CONTENTS PREVIOUS NEXT