Puzzle: probability problem


Here is an interesting probability problem who recently generated long discussions in our team:


Say that you have an array of N boolean values, with all values initially set to FALSE. At each iteration step, you arbitrary pick an element in the array and set it to TRUE. What is the average number of elements set to TRUE after K iteration steps?


By the way, as an anecdotic fact, this problem came from analyzing paged pool memory consumption when you randomly access a large volume over time. I will not go in more details, but I just wanted to mention this problem as a particular application of theoretical statistics in the design of operating system components…

Comments (6)

  1. Well, I’d try and attack a problem like this with a recurrence relation.

    Fix N.

    Then, define a function f(k,x) that is the probability that after k steps, x of the N elements are true.

    f(0,x)=1 if x=0 and 0 otherwise is a boundary condition.

    f(k,0)=1 if k=0 and 0 otherwise is another boundary condition.

    f(k,x)=f(k-1,x)x/N+f(k-1,x-1)(N-x+1)/N is the recurrence.

    The average number of true elements after K steps is then

    A(K)=sum from x=0 to N of xf(K,x).

    Plugging the recurrence into the average and iterating a few times, it looks like A(K)=N(1-(1-1/N)^K).

  2. AdiOltean says:

    Interesting (both the recurrence and the final formula are indeed correct). So what formula did you find for f(k,x)?

    Thanks, Adi

  3. ReuvenLax says:

    The different approaches to this problem in our group were interesting. Adi approached it with the recurrence that Nicholas Allen used. I used an inclusion-exclusion argument to show that f(k,x) is defined as follow:

    f(k,x) = 1/n^k (n choose x)sum from j = 0 to x-1 (-1)^j(x choose j)(i-j)^k

    This gives you the formula for the probability, but it’s a pain to simplify it down to n*(1-(1-1/n)^k).

    The simplest solution that another member of our group came up with is to consider the expected number of elements set for each element of the array (sounds weird). This is equal to the probability that a single element is 1, i.e. 1 – (1-1/n)^k. The expected number of elements set in the whole array is the sum of the expectations, i.e. n(1-(1/n)^k)

  4. I worked in A instead of f so I never needed an explicit solution to f(k,x). Normally this results in a mess since a sum over a recurrence relation with multiple terms typically leaves unpaired terms along the boundaries. However, in this case, the coefficients for f(k-j,0) and f(k-j,N) in the sum for A after applying the recurrence j times were the same formula as the general case.

    The (simple) solution using expectation is rather clever. That is quite a direct method that uses almost no math at all but instead a lot of creativity looking at the problem.

Skip to main content