A puzzle involving dynamic programming, or maybe it doesn’t


Here's a programming challenge:

Evaluate the following recurrence relation efficiently for a given array [x0, …, xn−1] and integer k.

f ([x0, x1],  k) =  (x0 + x1) / 2  for all k, n = 2. 
f ([x0, …, xn−1],  k) =  (x0 + xn−1) / 2 +  f ([x0, …, xn−2], k + 1) / 2 +  f ([x1, …, xn−1], k + 1) / 2  for even k, n ≥ 3. 
f ([x0, …, xn−1],  k) =    f ([x0, …, xn−2], k + 1) / 2 +  f ([x1, …, xn−1], k + 1) / 2  for odd k, n ≥ 3. 

Hint: Use dynamic programming.

In words:

  • If the array has only two elements, then the result is the average of the two elements.

  • If the array has more than two elements, then then the result is the sum of the following:

    • Half the value of the function evaluated on the array with the first element deleted and the second parameter incremented by one.

    • Half the value of the function evaluated on the array with the last element deleted and the second parameter incremented by one.

    • If the second parameter is even, then also add the average of the first and last elements of the original array.

The traditional approach with dynamic programming is to cache intermediate results in anticipation that the values will be needed again later. The naïve solution, therefore, would have a cache keyed by the vector and k.

My habit when trying to solve these sorts of recurrence relations is to solve the first few low-valued cases by hand. That gives me a better insight into the problem and may reveal some patterns or tricks.

f ([x0, x1, x2], keven = (x0 + x2) / 2 + f ([x0, x1], keven + 1) / 2 + f ([x1, x2], keven + 1) / 2
  = (x0 + x2) / 2 + f ([x0, x1], kodd) / 2 + f ([x1, x2], kodd) / 2
  = (x0 + x2) / 2 + (x0 + x1) / 4 + (x1 + x2) / 4
  = ¾x0 + ½x1 + ¾x2

Even just doing this one computation, we learned a lot. (Each of which can be proved by induction if you are new to this sort of thing, or which is evident by inspection if you're handy with math.)

First observation: The function is linear in the array elements. In other words,

f (x + y k f (x, k) + f (y, k),
f (ax k = a f (x, k).

To save space and improve readability, I'm using vector notation, where adding two vectors adds the elements componentwise, and multiplying a vector by a constant multiples each component. The long-form version of the first of the above equations would be

f ([x0 + y0, …, xn−1 + yn−1], k) = f ([x0, …, xn−1], k) + f ([y0, …, yn−1], k)

Since the result is a linear combination of the vector elements, we can work just with the coefficients and save ourselves some typing. ("Move to the dual space.") For notational convenience, we will write

a0, … an−1⟩ = a0 x0 + ⋯ + an−1 xn−1

Second observation: The specific value of k is not important. All that matters is whether it is even or odd, and each time we recurse to the next level, the parity flips. So the second parameter is really just a boolean. That greatly reduces the amount of stuff we need to cache, as well as increasing the likelihood of a cache hit. (The naïve version would not have realized that f (x, k) can steal the cached result from f (x, k + 2).)

Our previous hand computation can now be written as

f ([x0, x1, x2], even)  = ⟨½, 0, ½⟩ + f ([x0, x1], odd) / 2 + f ([x1, x2], odd) / 2
  = ⟨½, 0, ½⟩ + ⟨½, ½, 0⟩ / 2 + ⟨0, ½, ½⟩ / 2
  = ⟨½, 0, ½⟩ + ⟨¼, ¼, 0⟩ + ⟨0, ¼, ¼⟩
  = ⟨¾, ½, ¾⟩

Now that we are working with coefficients, we don't need to deal with x any more! All that matters is the length of the vector. This makes our recurrences much simpler:

f (2,  k ⟨½, ½⟩  for all k.
f (n even)  ⟨½, 0, …, 0, ½⟩ +  f (n−1, odd) / 2, 0⟩ +  ⟨0, f (n−1, odd) / 2⟩  for n ≥ 3.
f (n odd)    f (n−1, even) / 2, 0⟩ +  ⟨0, f (n−1, even) / 2⟩  for n ≥ 3.

Now we can carry out a few more hand computations.

f (3, odd)  = ⟨f (2, even) / 2, 0⟩ + ⟨0, f (2, even) / 2⟩
  = ⟨¼, ¼, 0⟩ + ⟨0, ¼, ¼⟩
  = ⟨¼, ½, ¼⟩
f (4, even)  = ⟨½, 0, 0, ½⟩ + ⟨f (3, odd) / 2, 0⟩ + ⟨0, f (3, odd) / 2⟩
  = ⟨½, 0, 0, ½⟩ + ⟨⅛, ¼, ⅛, 0⟩ + ⟨0, ⅛, ¼, ⅛⟩
  = ⟨⅝, ⅜, ⅜, ⅝⟩
f (4, odd)  = ⟨f (3, even) / 2, 0⟩ + ⟨0, f (3, even) / 2⟩
  = ⟨⅜, ¼, ⅜, 0⟩ + ⟨0, ⅜, ¼, ⅜⟩
  = ⟨⅜, ⅝, ⅝, ⅜⟩

The interesting thing to observe here is that the recursion does not branch. When we reduce the size of the vector by one element, the recursive calls are basically identical. We have to shift the coefficients differently in order to build up the results, but the recursive call itself is unchanged. This means that we need to perform only n−2 recursive steps to compute f (n, k).

Okay, now that we've studied the problem a bit, we can write the code. I'll write three versions of the function. The first computes according to the recurrence relation as originally written. We use this to verify our calculations.

function f1(x, k) {
 if (x.length == 2) {
  return (x[0] + x[1]) / 2;
 }
 var term = 0;
 if (k % 2 == 0) {
  term = (x[0] + x[x.length - 1]) / 2;
 }
 return term +
        f1(x.slice(0,-1), k + 1) / 2 +
        f1(x.slice(1), k + 1) / 2;
}

Immediate window:

f1([1,2,3], 0)
= 4.0

Okay, that matches our hand calculations, ¾·1 + ½·2 + ¾·3 = 4.

Now let's do the straight recursive version.

function dotproduct(a, x)
{
 var total = 0;
 for (var i = 0; i < a.length; i++) total += a[i] * x[i];
 return total;
}

function f2a(n, k)
{
 if (n == 2) return [1/2, 1/2];

 var c = new Array(n);
 for (var i = 0; i < n; i++) c[i] = 0;

 if (k % 2 == 0) {
  c[0] = 1/2;
  c[n-1] = 1/2;
 }

 var inner = f2a(n-1, k+1);
 for (var i = 0; i < n - 1; i++) {
  c[i] += inner[i] / 2;
  c[i+1] += inner[i] / 2;
 }

 return c;
}

function f2(x, k)
{
 return dotproduct(f2a(x.length, k), x);
}

Immediate window:

f2([1,2,3], 0)
= 4.0

Notice that there is no dynamic programming involved. The hint in the problem statement was a red herring!

Finally, we can eliminate the recursion by iterating n up from 2.

function f3(x, k)
{
 var c = new Array(x.length);
 for (var i = 0; i < x.length; i++) c[i] = 0;
 c[0] = 1/2;
 c[1] = 1/2;

 for (var n = 3; n <= x.length; n++) {
  var carry = 0;
  for (var i = 0; i < n; i++) {
   var nextcarry = c[i];
   c[i] = (carry + c[i]) / 2;
   carry = nextcarry;
  }
  if ((k + n + x.length) % 2 == 0) {
   c[0] += 1/2;
   c[n-1] += 1/2;
  }
 }
 return dotproduct(c, x);
}

I pulled a sneaky trick here in the place we test whether we are in the even or odd case. Officially, the test should be

  if ((k + (x.length - n)) % 2 == 0) {

but since we are interested only in whether the result is even or odd, we can just add the components together, because subtraction and addition have the same effect on even-ness and odd-ness. (If I really felt like micro-optimizing, I could fold x.length into k.)

Okay, now that we have our code, let's interpret the original problem.

The expression f (n, k) / 2, 0⟩ + ⟨0, f (n, k) / 2⟩ takes the vector f (n, k) and averages it against a shifted copy of itself. (The word convolution could be invoked here.) If you think of the coefficients as describing the distribution of a chemical, the expression calculates the effect of diffusion after one time step according to the simple model "At each time step, each molecule has a 50% chance of moving to the left and a 50% chance of moving to the right." (Since the length of the vector varies with n, I'll visualize the vector drawn with center-alignment.)

The base case ⟨½, ½⟩ describes the initial conditions of the diffusion, where half of the chemicals are on the left and half are on the right. This is one time step after one unit of the chemical was placed in the center. Let's get rid of the extra term in the recurrence and focus just on the diffusion aspect:

d(2) = ⟨½, ½⟩ 
d(n) = ⟨d(n−1) / 2, 0⟩ + ⟨0, d(n−1) / 2⟩  for n ≥ 3.

If you compute these values, you'll see that the results are awfully familiar. I've pulled out the common denominator to avoid the ugly fractions.

1 1 entire row divided by 2
1 2 1 entire row divided by 4
1 3 3 1 entire row divided by 8
1 4 6 4 1 entire row divided by 16
1 5 10 10 5 1   entire row divided by 32

Yup, it's Pascal's Triangle.

Notice that the sum across the row equals the amount we are dividing by, so that the sum of the row is always 1. (This is easy to see from the recurrence relation, since the base case establishes the property that the sum of the coordinates is 1, and the recurrence preserves it.)

This means that we can calculate the coefficients of d(n) for any value of n directly, without having to calculate any of coefficients for smaller values of n. The coefficients are just row n of Pascal's triangle, which we know how to compute in O(n) time.

Now we can also interpret the extra term we removed at the even steps. That term of the recurrence simulates adding a unit of chemical to the solution at every other time step, half a unit at the left edge and half at the right edge. And we can calculate these directly in the same way that we calculated the diffusion coefficients, since they basically are the diffusion coefficients, just with a time and location adjustment.

I pulled a fast one here. Maybe you didn't pick up on it: I'm assuming that the two parts of the recurrence unrelated to the diffusion behavior (the base condition and the extra term at the even steps) are independent and can simply be added together. You can show this by noting that given the generalized recurrence

fG(2, k) = G(2, k)
fG(n, k) = G(n, k) + ⟨fG(n−1, k + 1) / 2, 0⟩ + ⟨0, fG(n−1, k + 1) / 2⟩ for n ≥ 3.

then fG+H = fG + fH. (As before, induct on the recurrence relations.) Therefore, we can solve each of the pieces separately and just add the results together.

If I had the time and inclination, I could work out the solution for

C(n, i) + Σk even, 2 < k ≤ n C(k, i) / 2k

or something similar to that. Like I said, I ran out of time and inclination. But if I could come up with an efficient way to compute that value for all values of i for fixed n (and I believe there is, I'm just too lazy to work it out), then I would have an O(n) solution to the original recurrence.

(Okay, so the "If I had the time" bit is a cop-out, but I sort of lost interest in the problem.)

Comments (25)
  1. Mungo says:

    Should I consider a career change if I fall asleep after reading the first few lines?

  2. Crescens2k says:

    @Mungo:

    If you were like me and started feeling drowsy, but able to understand what was going on, then don't worry about it. Mathematics does it to me all the time.

    If on the other hand you don't understand, then the I would urge you at the very least to learn. While you can get things done as a programmer by throwing code together, this post is all about gaining optimisation opportunities through understanding mathematically what is going on. It is surprising how much mathematics is involved with computing, which is why all of the decent computer science degree courses I know involve discrete mathematics, complexity and automata as modules. Having a pretty decent foundation in other areas is also rather important.

  3. Smithers says:

    (In the f3() solution)

    What do you actually gain by the "sneaky trick" of using (k + n + x.length) instead of (k + (x.length – n))? Is subtraction somehow less efficient than addition?

    Also, I would have instinctively removed the need for carry and nextcarry in this section

     var carry = 0;

     for (var i = 0; i < n; i++) {

      var nextcarry = c[i];

      c[i] = (carry + c[i]) / 2;

      carry = nextcarry;

     }

    by going through the list in reverse order:

     for (var i = n-1; i > 0; i++) {

      c[i] = (c[i] + c[i-1]) / 2;

     }

     c[0] /= 2;

    [As a mathematician, I perform the sneaky trick instinctively. It eliminates parentheses, and since addition is commutative, it opens future opportunities for simplification. -Raymond]
  4. Smithers says:

    Except that obviously that should be i– in my version of the loop. I always get that wrong the first time.

  5. Anon says:

    @Crescens2k

    People often say these things, but don't provide concrete examples outside of gaming, data analysis, and specialised scientific/mathematics/development (e.g. writing my own compiler) applications.

    The core question is "What relevance is the conceptual example given, specifically, going to have to my software development career? I can only learn so many things, so why should I spend my time on Thing X rather than Thing Y?"

  6. Anon says:

    @Anon

    I should add that the example may have no relevance at all, and simply be trivia or a brainteaser — in which case, the question is irrelevant.

    But for my comment, assume the larger issue of "Why do people ask 'Why do I need to know this?'"

  7. Myria says:

    Here's a cute one someone taught me years ago: Simplify the following expression:

    (x-a)(x-b)(x-c)…(x-z)

  8. Mark says:

    Myria: am I right in thinking you want the answer 0?

  9. Myria says:

    @Smithers: "Is subtraction somehow less efficient than addition?"

    Actually, yes, sometimes it is.  The x86 "lea" instruction is often used to do three-operand addition.  The x86 can't normally do three-operand anything, but lea lets you use the specialized addressing modes.  However, the x86 doesn't have subtraction in its addressing modes – only addition and optional multiplication by 2, 4, and 8.

    [Also, x86 and x64 do not have a reverse-subtraction opcode. -Raymond]
  10. Myria says:

    @Mark: =^-^=

  11. Crescens2k says:

    @Anon:

    One example is if you are comparing two lists of strings.

    If the lists are unsorted, how long would it take to compare? For both lists being unsorted for every item in one list you would have to start from the beginning of the other. You can stop early if you find the item you are looking for, but you can't stop early in any other case. For the string itself, again, you can compare as far as needed in the string but it is hard to do more than that.

    What happens if both lists are sorted? You can start after the start of one list if you have found an item already, you know that any item in the list you are comparing against will always come after the one you found. You are also able to make other optimisations, like only comparing as many letters as needed until you find a difference, even skipping blocks of the array by first letter comparison alone.

    While there are many things that be commented on in this example, the fact is that this kind of runtime analysis for even regular functions is in the area of complexity. Of course it is possible to make this kind of reasoning regardless, but learning the mathematics behind it not only formalises the knowledge, but as I often find, you start doing these things implicitly.

  12. alexcohn says:

    This reminds me how few years ago I participated in a contest for young programmers. One of the tasks was to efficiently calculate 1+2+3+…+1978 in as many high-level programming languages as possible. This was my special hobby these days, so I quickly wrote the simple for loop in 20+ languages, ranging from COBOL to Lisp (have I mentioned that the year was 1978?). To my shame, this solution was rated second after the one that simply printed 1957231.

  13. Billy O'Neal says:

    To my understanding, your "iterating up from 2" example is dynamic programming; you are computing the result bottom up with a cache that lets you reuse previous results. This is no different than the first example usually given for dynamic programming the Fibonacci number N; which is O(2^N) for the recursive formulation:

    int f(int n)

    {

       if (n == 0 || n == 1) { return 1; }

       return f(n – 2) + f(n – 1);

    }

    but O(N) for the dynamic programming version:

    int f(int n)

    {

       int nMinusTwo = 1;

       int nMinusOne = 1; // This is the "cache" of dynamic programming

       for (int idx = 2; idx <= n; ++idx)

       {

           int currentN = nMinusTwo + nMinusOne;

           nMinusTwo = nMinusOne;

           nMinusOne = currentN;

       }

       return nMinusOne;

    }

    (Your recursive version indeed uses no dynamic programming and that's the point of the article; but people may be confused after seeing the second which does have it)

    [It's a cache in the sense that any variable can be viewed as a cache. ("I'm caching the 'largest value seen in elements 0 through i'.") The value is immediately consumed by the next iteration, so it's really just an accumulator. Dynamic programming usually means employing a general-purpose cache rather than an accumulator.
    int f(int n) {
        cache[0] = cache[1] = 1;
        for (int idx = 2; idx <= n; ++idx) {
            cache[idx] = cache[idx - 2] + cache[idx - 1];
        }
       return cache[n]; }
    -Raymond
    ]
  14. Joker_vD says:

    Wait. You're just numerically solving an initial-boundary value problem for the one-dimensional heat equation, aren't you? Or am I just seeing things at 2 AM?

  15. GregM says:

    "To my shame, this solution was rated second after the one that simply printed 1957231."

    I would not have rated it second to that, but I would have rated it second to 1978 * (1978 + 1) / 2, assuming that the assignment was in fact "calculate the answer to" and not simply "print the answer to".

  16. hacksoncode says:

    @Anon: This question is exactly analogous to "People tell me I should lift weights, but what relevance does this have, specifically, to my career?".

    The reason you solve problems like this is that mental exercise builds mental muscles just like physical exercise builds physical muscles.

    You're not going to find yourself needing to lift a barbell in (nearly) any career, either. But you will need to do *analogous* things if you have a career where upper-body strength is important. Additionally, it will improve your general physical fitness, which will have beneficial effects on your overall well-being.

    Programming is a career where this kind of mental strength is extremely useful, if not actually absolutely vital. And it will improve your general mental fitness, which will have beneficial effects on your overall well-being.

    This problem is probably more complicated (mathematically) than most you will encounter in day-to-day programming, this is true. But you want those problems you do encounter to be *easy* for you to solve, not a huge struggle.

  17. BOFH says:

    Either way, it's just a oneliner:

    perl -le "$i += $_ and print $i for 1 .. 1978"

  18. Drak says:

    @BOFH:

    And now in as may high-level programming languages as possible…

  19. alexcohn says:

    @GregM: you are right, now I remember that the actual prize-winner did exactly as you said, writeln(1978*(1978+1)/2).

  20. Neil says:

    I would have thought that merely printing "1957231" should be disqualified for a) the inefficiency in having the programmer calculate the result instead of the computer and/or b) not including the algorithm used to calculate the result as part of the submission. (I would accept a solution that moved the computation from the executable to the compiler or optimiser.)

  21. Andrei says:

    Finally, one that is correctly rendered using Internet Explorer, but not using Chrome

    [Looks fine to me aside from Chrome's problem with U+27E8 and U+27E9. Maybe your window isn't wide enough. Let me see if I can try to fix that. -Raymond]
  22. alegr1 says:

    This article should be re-named "The dangers of buffering up posted messages and then reposting them later".

  23. Mark says:

    alegr1: itym articles

  24. alexcohn says:

    Chrome fixed, many thanks from my Android device!

  25. Rahul Ramadas says:

    This is still dynamic programming. You are computing values in a table and updating it (as a function of pre-existing values in the table) and arriving at the answer: dynamic programming. You did it bottom-up instead of top-down with memoization, that's all.

Comments are closed.