This is another math-oriented puzzle, this time with probabilities. The answer to the birthday paradox is well known, but it’s fun to derive it.
Puzzle: How many people do you need before the odds are good (greater than 50%) that at least two of them share a birthday?
Hint: It’s easier to consider the chance that no one shares a birthday, and invert that.
Answer: 23 people
Solution: Let’s figure the odds that no one shares a birthday and invert that. The odds are calculated by counting all the ways that N people won’t share a birthday and dividing by the number of possible birthdays they could have.
For example, two people could have 365×365 birthday combinations. That’s the denominator. To count the numerator, imagine that the first person gets to choose their birthday. They can pick from 365 days. The second person can also pick their birthday, but can’t share a birthday with the first person. They’ve got 364 days to choose from. So the chance that two people don’t share a birthday is (365×364)/365². Subtract that from 1 and you get what you expect: that there’s a 1 in 365 chance that two people share a birthday.
For three people, the denominator is 365³ and the numerator is 365×364×363. The formula for N people is:
P(N) = [365 × 364 × · · · × (365−N+1)] / 365N
I could have used factorials to express the numerator, but this way works better. I’ll explain later.
We’re trying to solve this equation:
1 - P(N) = 0.5
I don’t know how to do that. There’s probably a way to solve equations that involve factorials, but I couldn’t find it on the Net, so I wrote a quicky Python program to try every value of N:
for N in range(1, 365 + 1): p = 1.0 for i in range(1, N + 1): p = p * (365 - i + 1) / 365 p = 1 - p print N, p
This goes over 50% at 23 people. Using double precision floats, the output says “1” after 134 people, but the probability doesn’t really reach 100% until 366 people. If anyone knows how to solve this equation algebraically, please let me know.
When I first put up this page, I had the right answer but the wrong explanation. Martin Ziegler pointed that out to me (thanks!), and proved it by showing that my solution gave the wrong answer for N = 366. You should get a probability of 1, but my answer gave a probability just below that. This is my original solution, followed by a comparison of the probabilities you get from each.
If you have two people, the chance that they share a birthday is 1/365. If you have three people (A, B, and C), then you’ve got three ways (AB, BC, and AC) that birthdays could be shared. Since you want any of the three pairs to share, you can’t just add or multiply the probabilities. Instead you consider the chance that they don’t share a birthday (1 - 1/365 = 364/365) and multiply that by itself three times. If you have four people, then they could pair up in six ways, so the probability that none of those pairs shares a birthday is (364/365)6. The probability that any do share a birthday is 1 minus that. We want to keep increasing N, the number of people, until that probability reaches 50%.
Given N you can calculate the number of pairs with N-choose-2, meaning given N items, the number of ways to pick any two of them. It’s usually written as the letter N sitting over the number 2 with large parentheses around both. (When MathML becomes better supported these pages will become clearer. Or at least better typeset.)
The formula for A-choose-B is A! / [B! (A - B)!]. I remember that formula like this: N! is the number of permutations of N items. Imagine all the permutations of the A items we’re trying to choose from, each permutation on one line. There are A! of these lines. For each line, keep the first B items and throw away the other A - B items. You’ll be left with a lot of duplicate lines, so get rid of duplicates. For each unique line left over you threw away (A - B)! lines. Now ignore the position of the items (since we don’t care about the order of the B items we’re choosing), and get rid of duplicate lines again. For each unique line left over you threw away B! lines. That’s how you get that formula.
We have N people and want to find out how many pairs of people there are. That’s N-choose-2, or N! / (2! (N - 2)!). That simplifies to N(N - 1)/2. The probability that no two people share a birthday is:
(364/365)N(N - 1)/2
So the probability that any two people do share one is:
P(N) = 1 - (364/365)N(N - 1)/2
We want to find out the break-even point for P(N) = 0.50, so we plug that in and use the quadratic formula to solve for N, giving us 22.98, or 23 people.
I modified my Python program to dump both values, and this is the graph of the first 50 values of N:
They’re amazingly similar considering that the approaches seem so different. The wrong solution assumes that all birthday pairs are independent, but that isn’t true. If persons A and B don’t share a birthday and B and C don’t either, then the chance that A and C share a birthday is affected by that information. (Think through the case where there are only three days in the year to choose from.)
In the formula at the top I avoided using factorials in the probability function because when N = 366 I wanted the numerator to be equal to 0, but 0! = 1, which messes up that corner case.
It’s not really a paradox, it’s just that most people expect the answer to be much higher, like 365/2 or something.