# Mathematical formulas are designed to be pretty, not to be suitable for computation

When you ask a mathematician to come up with a formula to solve a problem, you will get something that looks pretty, but that doesn't mean that it lends itself well to computation.

For example, consider the binomial coefficient, traditionally written nCk or C(n, k), and in more modern notation as (nk). If you ask a mathematician for a formula for the binomial coefficient, you will get the elegant reply

 ( n ) = n! k k!(n − k)!

(That took forever to format. I will use the traditional notation from now on purely for typographical expediency.)

This is a very beautiful formula, but it's horrible for actual computation because the factorials will be very expensive and are likely to overflow your integer data type even at low values of n. (So you may as well just use a lookup table.) And the k! in the denominator exactly cancels the first k factors in n!, so most of your work in multiplying the numbers together is just going to be undone by the division.

For computation, you're much better off using the recurrence C(n, k) = C(n − 1, k − 1) × nk. This is the recurrence you learned in high school when you had to calculate binomial coefficients by hand: You start with 1 · xⁿ and then to get the next coefficient, you multiply by the exponent on the x and divide by the current position (starting at 1), then decrement the exponent. For example, let's calculate the binomial coefficients C(8, k).

 1 · x⁸ bring down the 8 and divide by 1 (resulting in 1 × 8 ÷ 1 = 8), then decrement the exponent 8 · x⁷ bring down the 7 and divide by 2 (resulting in 8 × 7 ÷ 2 = 28), then decrement the exponent 28 · x⁶ bring down the 6 and divide by 3 (resulting in 28 × 6 ÷ 3 = 56), then decrement the exponent 56 · x⁵ bring down the 5 and divide by 4 (resulting in 56 × 5 ÷ 4 = 70), then decrement the exponent 70 · x⁴ bring down the 4 and divide by 5 (resulting in 70 × 4 ÷ 5 = 56), then decrement the exponent 56 · x³ bring down the 3 and divide by 6 (resulting in 56 × 3 ÷ 6 = 28), then decrement the exponent 28 · x² bring down the 2 and divide by 7 (resulting in 28 × 2 ÷ 7 = 8), then decrement the exponent 8 · x¹ bring down the 1 and divide by 8 (resulting in 8 × 1 ÷ 8 = 1), then decrement the exponent 1 · x⁰ bring down the 0, which makes everything zero

(Am I the only person who calculated binomial coefficients by hand?) Notice that the calculations in the second half are the exact inverse of the calculations of the first half, so you only have to do the computations halfway, and then you can just mirror the rest. This is just another way of seeing that C(n, k) = C(n, nk).

This technique lets you evaluate C(50, 7) = 99884400 without overflowing a 32-bit integer.

Often people will ask for an efficient way of calculating factorials, when in fact they don't really need factorials (which is a good thing, because that would require a bignum package); they are really just trying to evaluate a formula that happens to be expressed mathematically with factorials (because factorials are pretty).

Another place pretty formulas prove unsuitable for computation is in Taylor series. The denominator of a Taylor series is typically a factorial, and the numerator can get quite large, too. For example, exp(x) = Σ xⁿn!. Instead of calculating the power and factorial at each term, use the recurrence

 xn = x xn−1 n! n (n − 1)!

In compiler-terms, you're strength-reducing the loop.

Of course, another problem is that you are adding large numbers first, and then adding smaller numbers later. From a numerical analysis point of view, you should add the smaller numbers first so that they can retain significance longer.

As an example, consider that you have to add the following numbers: 999, and ten 0.1's, and suppose your floating point format is good to only three significant digits. If you added them largest to smallest, you would get this:

 999 999 + 0.1 = 999 (three sigificant digits) 999 + 0.1 = 999 (three sigificant digits) ... and so on ...

Your final total will be 999. But if you added the smaller numbers first, then you would get

 0.1 0.1 + 0.1 = 0.2 0.2 + 0.1 = 0.3 ... and so on ... 0.9 + 0.1 = 1 1 + 999 = 1000

By adding the small numbers first, you gave them a chance to accumulate to something meaningful before the big number came along and swamped them.

Remember, the way a formula is written on paper is not necessarily the best way of computing it. (And if the formula was written by a mathematician, it is almost certainly not!)

Tags

Does the blog software allow you to use MathML?

[Sure, but a lot of browsers don't support MathML. -Raymond]
2. parkrrrr says:

There's a whole class mathematicians take that deals with this and similar issues, called Numerical Analysis. We just assume programmers know this stuff, too.

3. Veltas says:

@parkrrrr: Which was mentioned in the article.

4. Veltas says:

I really like this article; informative, and with no real prerequisites other than maths.

5. BZ says:

@Veltas "other than maths"? That's a big deal. I've learned all these formulas and how to use them and I don't remember half of it. On the other hand, I program every day, and Win32/C/.NET stuff is much easier for me to read.

One of my professors (I majored in computer science and engineering) even said "you will probably never use calculus in your career, but we make you take it anyway".  He was right.

6. Joshua says:

Old news, but needs to be retaught every 10 years because the bloody idiots who design CS classes don't teach this stuff!

7. Another way to calculate the binomial coefficients C(n, k) without doing any divisions at all is to start with C(0, 0) = 1 and then use the recursion C(n, k) = C(n – 1, k – 1) + C(n – 1, k).

[Exercise: Discuss why this is a bad way to calculate binomial coefficents. -Raymond]
8. John Costello says:

Back when I interviewed SDETs for Microsoft, I would ask for an implementation and test cases for the binomial coefficient (and give the mathematical definition as the "PM Spec").

No-hires would jump into implementation.

Marginal candidates would write some test cases first.

Good candidates would observe in the process of writing test cases that C(n, 0) = C(n, n) = 1, C(n, 1) = C(n, n-1) = n, and C(n, k) = C(n, n-k) and thus get a whole bunch of automatically generated test cases.

Great candidates would demolish the PM spec before ever writing a single test case.

9. John Costello says:

Maurits — that doesn't work unless you memoize, because it has exponential running time.

[Even if you memoize, it requires quadratic space and time. -Raymond]
10. GWO says:

[Exercise: Discuss why this is a bad way to calculate binomial coefficents. -Raymond]

Because the number of calculations grows quadratically with n.  It is, however the way I do them by hand when n < 10 (i.e. write down Pascal's triangle).  Of course, it helps that 1,5,10,10,5,1 is easy to remember, so you can start half way down.

11. Yes, the time is quadratic, but the memory requirements are only O(n); you can get away with storing only two rows of the triangle.

12. Veltas says:

I feel like I should ask now if it's faster for small n, because the Pascal's triangle identity allows you to find the coefficients through addition, whereas the better complexity method uses multiplication and division.

I personally wouldn't know, I don't know much about performance at all.

Also, wouldn't it be more appropriate to measure algorithmic time complexity with respect to bitsize of n and k, rather than the actual values?

[Not sure what the benefit of measuring in bit size is. -Raymond]
13. Veltas says:

I was just taught to do it like that.

I suppose it means you're considering the 'size' of the input, rather than the magnitude of a single number given.

[It depends on the nature of the problem. If you used bit sizes, then you would conclude that calculating C(n,1024) and C(n,2047) were roughly equivalent in cost, when in fact the second costs nearly twice as much. -Raymond]
14. Veltas says:

I'm not sure I understand, as the time complexity would probably be of the form O(2^k) or something similar, so if one were to roughly add 1 to k (i.e. go from 1024 to 2047), the time should roughly double.

Although I'm starting to rethink this convention, it doesn't really make it easier to understand.

[While it's true that it tells you that doubling the second parameter to C(n,k) also doubles cost, it misleads you into thinking that all values of the second parameter from 1024 to 2047 cost the same. -Raymond]
15. Weiqi Gao says:

Take a look at MathJax (http://www.mathjax.org/) for serving mathmatics on the Web.  Detailed instructions for enabling it in various blogging software/sites can be found here: checkmyworking.com/…/how-to-get-beautifully-typeset-maths-on-your-blog.

[Great, now I have to go talk to a lawyer. -Raymond]
16. Anoniempje says:

Adam Rosenfield, Google seems to intend to drop any plans for future MathML support in their ‘new’ Blink engine.

Weiqi Goa, I just looked at the demoes on the web page you mention and something goes horribly wrong for me. I just see unformatted Tex (using Chromium 28.something).

17. Sven2 says:

I'd say formulas aren't necessarily designed to look pretty, but they're designed to work well for stuff that mathematicians do (e.g.: Derive other formulas). And for that, the definition as a series is usually less usable than the definition using factorials.

18. meangrape says:

You need to ask the mathematician for an analytic form of the equation. en.wikipedia.org/…/Analytic_expression

[The article says that the gamma function (generalized factorial) and infinite series are permitted in analytic expressions, so the formulas given in the article are already analytic. But they are not suitable for computation. -Raymond]
19. foo says:

That took forever to format. I will use the traditional notation from now on purely for typographical expediency

Seems like pretty mathematical formulas aren't designed to be suitable for general-purpose blog entries either.

20. Jon says:

Talking about the gamma function is a good point – if you're happy with using floating point numbers and don't need an exact result, you can calculate binomial coefficients as follows:

n_c_k = exp(lgamma(n+1)-lgamma(k+1)-lgamma(n-k+1));

Not sure how the performance compares to the integer implementation. It's definitely slower for small inputs and maybe faster for large inputs. The lgamma calls can cause extreme numerical errors depending on n and k, so if anyone wants to use this, be careful.

21. Craig says:

"(Am I the only person who calculated binomial coefficients by hand?)"

I have a degree in computer engineering and had to go look up what a binomial coefficient *is*, let alone calculate them by hand.

22. Ah, I remember from high school, when the task was to "simplify the formula". The "simplified" form usually only looked more pretty, but required MORE computation. And some of the transformations were completely pointless (at least their point was never convincingly explained), like "rationalizing the denominator".

23. Neil says:

@Veltas I always thought that the standard way to describe a computation whose time depends on the number of bits in n was O(log n).

24. Pomax says:

Just use MathJax and typeset it as [{n choose k} = frac{n!}{k!(n-k)!]], and magic: now your formulae look pretty in pretty much any browser. It's somewhat the defacto library for actually putting math, rather than horrible ascii, on a page these days =)

[Great, now I have to hire a lawyer to read the licensing agreement. -Raymond]
25. Dem says:

Re: Great, now I have to hire a lawyer to read the licensing agreement. -Raymond

Err, it's open source and licensed under the Apache 2.0 license, why would you need a lawyer to read that?

[Because I don't trust myself to read the license and understand its consequences fully. -Raymond]
26. Pomax says:

Then it's time to lawyer up, you're using other "free to use on a website" libraries already ;) MathJax is just as free as jQuery etc. it's a CDN-hosted library for the web, not something you pay for to put on your website.

[And I'll have to add MathJax support to my content management system. Given that my content management system doesn't even support images, you can imagine how likely MathJax support is going to show up. -Raymond]
27. Anonymous Coward says:

The Apache 2.0 licence is strange (in my opinion) but also short, devoid of hidden gotchas and well understood.

However, I personally wouldn't bet on a math system that results in broken output on the #1 rendering engine while at the same time claiming that it brings beautiful math to all browsers.

Plus, this isn't the kind of site where formulæ are a big necessity. I don't think it's going to be worth it.

28. Neil says:

C(n, k) = C(n – 1, k – 1) × n ÷ k, although true, isn't the relation C(n, k) = C(n, k – 1) × (n + 1 – k) ÷ k that you actually demonstrated.

29. mikeb says:

Way back when I was in college, I had to take a 1 credit numerical analysis course (taught in Basic, no less) that was supposed to be really easy.  It was only one credit, after all.

But it was the hardest course I had that semester.

Math is generally elegant and beautiful, as Raymond says. Arithmetic on computers is an ugly hack.

30. Raphael says:

MathJax? Isn't that the thing on Wikipedia you can use to increase the loading time of a page by about 10 seconds?

31. Calculate C(50, 7) using C(n, k) = C(n – 1, k – 1) * n / k

Subtract k from both n and k to get the starting point: C(50 – 7, 7 – 7) = C(43, 0)

C(43, 0) = 1 by definition

C(44, 1) = C(43, 0) * 44 / 1 = 1 * 44 / 1 = 44

C(45, 2) = C(44, 1) * 45 / 2 = 44 * 45 / 2 = 990

C(46, 3) = C(45, 2) * 46 / 3 = 990 * 46 / 3 = 15180

C(47, 4) = C(46, 3) * 47 / 4 = 15180 * 47 / 4 = 178365

C(48, 5) = C(47, 4) * 48 / 5 = 178365 * 48 / 5 = 1712304

C(49, 6) = C(48, 5) * 49 / 6 = 1712304 * 49 / 6 = 13983816

C(50, 7) = C(49, 6) * 50 / 7 = 13983816 * 50 / 7 = 99884400

32. Stallman Torvalds says:

I bet if it were under MsPL you'd be all over it. Stop spreading FUD, you shill.

[Presumably the Microsoft legal department reviewed the MS-PL license. But my content management system doesn't support it so even if all the legal hurdles were cleared, I probably still wouldn't use it. -Raymond]
33. James says:

About changing the order of addition: first make sure the series is absolutely convergent. There's a theorem: given a series that's convergent but not absolutely convergent, you can shuffle it to make it converge to anything you want.

[You can freely shuffle a finite series. -Raymond]