Enumerating the ways of distributing n balls into k boxes


Suppose you had n indistinguishable balls and k distinguishable boxes. Enumerate the ways of distributing the balls into boxes. Some boxes may be empty.

We can represent each distribution in the form of n stars and k − 1 vertical lines. The stars represent balls, and the vertical lines divide the balls into boxes. For example, here are the possible distributions for n = 3, k = 3:

***|| 3+0+0
**|*| 2+1+0
**||* 2+0+1
*|**| 1+2+0
*|*|* 1+1+1
*||** 1+0+2
|***| 0+3+0
|**|* 0+2+1
|*|** 0+1+2
||*** 0+0+3

This visualization is known in combinatorics circles as stars and bars.

From this visualization, we see that what we are doing is taking n + k − 1 slots, and in each slot placing a star or a bar, subject to the constraint that there be n stars and k − 1 bars. Another way of looking at this is that we are choosing a subset of size k − 1 from a set of size n + k − 1 (the subset specifying where the bars go).

Now we can fire up our subset-generating machine.

function Distributions(n, k, f) {
 Subsets(n + k - 1, k - 1, function(s) {
  s.push(n + k);
  f(s.map(function(v, i) { return v - (s[i-1]||0) - 1; }));
  s.pop();
 });
}

We ask to generate subsets of size k − 1 from a set of size n + k − 1. For each such subset, we draw an artificial bar at the end (slot n + k), then calculate the number of stars between the bars. The number of stars between two bars is the distance between the two bars, minus 1 because the bar takes up space, too.

Another solution is to reduce this to a problem we already know how to solve: enumerating integer compositions. After distributing the balls into boxes, we go around like Santa Claus and give each box one extra ball, which produces a composition. Conversely, for any composition, remove one ball from each box, and you get a distribution.

function Distributions(n, k, f)
{
 Compositions(n + k, k, function(s) {
  f(s.map(function(v) { return v - 1; }));
 });
}

We added k extra balls, so we need to generate compositions of n + k. When we get each composition, we take one ball away from each box and call that the distribution.

Comments (16)
  1. Henke37 says:

    This feels like a clear task for Prolog.

  2. Joshua says:

    Gee I was expecting another comment from "Al go" but I guess he lived up to his name and went.

  3. Danny says:

    Huh? This is well known in math, is called arrangement. And the formula is in form : A(k,n) = n(n−1)⋯(n−k+1) = n!/((n−k)!). See wiki, is simple and clear. And to not be confused with combinations of (k,n) where the formula is: C(k,n) = n!/(k!*((n-k)!))

    [Not sure which arrangement you are talking about. -Raymond]
  4. Katie says:

    There is this two sentence summary: mathworld.wolfram.com/Arrangement.html

    It mentions that when order isn't important (indistinguishable balls, like you mentioned) that you use a combination (like you described) to calculate the number of "arrangements."

  5. Katie says:

    Here's their coverage of the subject, including star and bars: mathworld.wolfram.com/Multichoose.html

    It does use the term arrangement, but there's no A(k,n) notation involved. It uses the same method as Raymond. I am interested in the recipe from the example at the end.

  6. Danny says:

    "Not sure which arrangement you are talking about." - No? a typo? on a typo? then let me google that for you :D => lmgtfy.com - and then the 1st link is...tadaaaa..(insert suspense  music here) ... cbpowell.wordpress.com/.../permutations-vs-combinations-how-to-calculate-arrangements  Poor wiki, lol, for english version does not have the math formula article, while the rest of the languages does, like French one: fr.wikipedia.org/.../Arrangement  or the Romanian one (my native language): ro.wikipedia.org/.../Aranjament  both using the A(k, n) notation Katie.

    [That's not fair. You said "wiki", so obviously I ignored links on wordpress.com. And the wordpress page you linked to is a different problem (permutations, not distribution of indistinguishable balls in distinguishable boxes). Consider n = k = 2. There is only one way to distribute 2 balls into 2 boxes (namely, put one ball in box A, and one ball in box B). But your formula gives 2!/(2-2)! = 2, which corresponds to the two ways of arranging two distinguishable balls from a set of two (namely, AB and BA). -Raymond]
  7. Ben Voigt says:

    For n = k = 2, there are three distributions: 0+2, 1+1, 2+0

    The formula, based on stars and bars, is (n+k-1)!/n!/(k-1)!

    The dissymmetry between n and k is produced because the n balls are indistinguishable, but the k boxes are tagged.

    [Oops, you're right. I got this confused with compositions, which requires each box to have at least one ball. -Raymond]
  8. Al Go says:

    Enough of this series.  It's your least popular one. It's the same thing each time, with a different telling of the problem.  Plus, I never received my refund from last time you promised me one.

    [I removed the charge from your credit card. I like combinatorial enumeration. (And I'm not the only one.) Sorry you don't. -Raymond]
  9. Wear says:

    @Al Go, Oh no, the ratings are slipping? That's not good, Raymond you need to keep those key demographics interested or else they'll start reading other blogs. I hear middle age women are what all the networks are after these days. Obviously that means this series should be replaced by one about knitting.

  10. Al Go says:

    How about a compromise:  come up with real world examples instead of balls and boxes. That way it would seem more practical. I've had it up to here with balls.

    [This is not intended to be practical. We're just doing some algorithmic exercise. -Raymond]
  11. Joshua says:

    I've a better idea. How about you live up to your name Mr Go.

  12. Crescens2k says:

    @Al Go:

    The thing is, these things while not seemingly practical can be used in other, more advanced algorithms.

    For this particular combinatorial problem, you can use the principle in hashing algorithms. There are other interesting things, like the pigeonhole principle and the birthday problem which can appear. While not seemingly useful immediately, they all give a certain insight into situations that can arise in algorithm design.

    So if I was to hazard a guess at what Raymond is trying to do. He is writing posts on how to solve these problems. If you don't know what the problem is though, you would go out and search for it, find the reason why you should know it in CS and feel enriched. That is speculation on my part though.

  13. rs says:

    MathWorld articles often tend to be idiosyncratic.

  14. Anon says:

    @Crescens2k

    Many of us can't learn a concept without some practical application presented, so merely presenting the theory isn't terribly educational.

    Same way you can't learn to be a developer by merely reading some listings, really. You have to actually write code for practical purposes.

  15. Crescens2k says:

    @Anon:

    And that is where the other part of someone working in CS comes into it. If you don't know something, research. These posts don't seem to be about really teaching the theory behind the problem either. Which is why I speculate that the thing that Raymond is doing is bringing to light these problems, so people become aware of them and go ahead and look them up.

    Also remember, if you were to present this as a practical application, in such a small post it would most likely lose a lot of information. Also, as I said, they give insight into a problem and not end up being directly applicable. In fact, if you look at it another way, these are the practical application of the problem.

    So, where do I get off on saying that. Imagine you have say 10 bins side by side, and you throw 10 balls into the bins, the probability of a bin containing more than 1 ball is not 0. The little code snippet is a way to generate different combinations at runtime. So how does this apply to an algorithm? Well, if you hash something with a finite set of possible output values (i.e. int32), the probability of a collision is not 0, and should be uniformly random if you have a good uniform hashing algorithm.

    Theoretical problems usually use a real life example to study, but then apply it somehow to an algorithm. Like for hashing, you wouldn't really care too much about going around throwing balls at bins (or boxes) at all.

  16. Chris says:

    Raymond, why feed the trolls esp this Danny character and AI Go? Some people are just never happy.

Comments are closed.

Skip to main content