TA's note on the birthday problem (section 3.2 of Dekking's book)

We'll see that it isn't difficult at all to find the probability of the event that no two students' birthdays in a class of size n ( we've got n about 25-30 people here) fall on the same day. The probability of the complement of this event ( i.e of the event that the birthdays of at least two people here coincide) may surprise you. We'll give intuition later as to why this may not be as surprising as it at first may seem.

Contents

A way of generating random birthdays

Let's ask matlab to pick 23 birthdates at random. We'll imagine the days in a year are numbered 1 through 365 (no Feb 29). For example, Feb 8, the day when your next assignment is due is represented by the number 39 - 39-th day of the year.

days = 365;
num_of_students = 23;
birth_dates = ceil(rand(1, num_of_students) * days) % pressing the F1 key with the cursor over
                                                    % 'ceil' or 'rand' pops up a little help
                                                    % window telling you more about
                                                    % these functions
birth_dates =

  Columns 1 through 5

     138             44            279            191            247       

  Columns 6 through 10

     102            236            345            170             78       

  Columns 11 through 15

     265             19            191            161            119       

  Columns 16 through 20

     196            129             62            363             89       

  Columns 21 through 23

     198            201             46       

A little simulation

Imagine a world in which CIS 2033 is taught all over the place, in 10 000 places in fact ( if that sounds like a lot, allow for different semesters). Furthermore, imagine that for each of these CIS-2033s we find 23 brave enough students to take it. Let's see in how many of these 10 000 classes/cases we observe that the birthdays of all 23 students taking it fall on different days. What do you think it might give us? (Remember we mentioned this frequentist approach to defining probability? If not, it's OK.)

format rat
reps = 10000;        % number of repetitions (num of times CIS 2033 is taught)
rec = NaN(1,reps);   % for each repetition we'll keep a record (see loop below)
days = 365;
num_of_students = 23;
for k = 1:reps
    birth_dates = ceil(rand(1, num_of_students)*days);
    rec(k) = length(unique(birth_dates));
end
% how often (out of 10000) do we get to see coincidences in the birthdays
% of the 23 students? freq gives the answer:
freq = sum(rec < num_of_students)/reps
freq =

    2483/5000  

So how could we compute B_n of page 28 of the book?

Let's see if we can get matlab to plot the results similar to those you see in Chapter 3, Fig 3.1. Look at page 28 of the book. First, how does one generate B_n? Let's start small - B_5 - what's the probability that the birthdays of 5 randomly chosen students differ?

num_of_students = 5;
days = 365;
ones_over_num_of_days = ones(1, num_of_students - 1) * 1/days   % a vector of 1/365s
numers = (num_of_students - 1: -1: 1)                           % vector of numerators - see last equation on page 28 of Dekking
B_n = ones(1, num_of_students - 1) - numers.*ones_over_num_of_days
prob_student_birthdays_differ = prod(ones(1, num_of_students - 1) - numers.*ones_over_num_of_days)

% Try varying num_of_students. Do the results make sense to you?
ones_over_num_of_days =

       1/365          1/365          1/365          1/365   


numers =

       4              3              2              1       


B_n =

     361/365        362/365        363/365        364/365   


prob_student_birthdays_differ =

     968/995   

Can I see a picture?

Yes, we'd like to - some people say it's worth a lot of words. Let's compute a bunch of B_n s and plot the the results:

students = 1:100;
days = 365;
prob_B_n = [];
for k = 2:length(students)   % may not make sense to start at k = 1,but matlab won't complain if you do
    ones_over_num_of_days = ones(1, k - 1) * 1/days;
    numers = (k - 1: -1: 1);
    (k - 1: -1: 1).*ones_over_num_of_days;
    prob_B_n(k) = prod(ones(1, k - 1) - numers.*ones_over_num_of_days);
end
prob_B_n;
plot(students, prob_B_n, 'b.')