Here’s a programming challenge:
Evaluate the following recurrence relation efficiently for a given array [x_{0}, …, x_{n−1}] and integer k.
= 3″>
f ([x_{0}, x_{1}], k) =_{ } (x_{0} + x_{1}) / 2 for all k, n = 2._{ } f ([x_{0}, …, x_{n−1}], k) =_{ } f ([x_{0}, …, x_{n−2}], k + 1) / 2 + f ([x_{1}, …, x_{n−1}], k + 1) / 2 for odd k, n ≥ 3._{ } Hint: Use dynamic programming.
In words:
- If the array has only two elements, then the result is the average of the two elements.
- If the array has more than two elements, then
then the result is the sum of the following:
- Half the value of the function evaluated on the array with the first element deleted and the second parameter incremented by one.
- Half the value of the function evaluated on the array with the last element deleted and the second parameter incremented by one.
- If the second parameter is even, then also add the average of the first and last elements of the original array.
The traditional approach with dynamic programming is to cache intermediate results in anticipation that the values will be needed again later. The naïve solution, therefore, would have a cache keyed by the vector and k.
My habit when trying to solve these sorts of recurrence relations is to solve the first few low-valued cases by hand. That gives me a better insight into the problem and may reveal some patterns or tricks.
f ([x_{0}, x_{1}, x_{2}], k_{even}) | = (x_{0} + x_{2}) / 2 + f ([x_{0}, x_{1}], k_{even} + 1) / 2 + f ([x_{1}, x_{2}], k_{even} + 1) / 2 |
= (x_{0} + x_{2}) / 2 + f ([x_{0}, x_{1}], k_{odd}) / 2 + f ([x_{1}, x_{2}], k_{odd}) / 2 | |
= (x_{0} + x_{2}) / 2 + (x_{0} + x_{1}) / 4 + (x_{1} + x_{2}) / 4 | |
= ¾x_{0} + ½x_{1} + ¾x_{2} |
Even just doing this one computation, we learned a lot. (Each of which can be proved by induction if you are new to this sort of thing, or which is evident by inspection if you’re handy with math.)
First observation: The function is linear in the array elements. In other words,
f (x + y, | k) | = | f (x, k) + f (y, k), |
f (ax, | k) | = a | f (x, k). |
To save space and improve readability, I’m using vector notation, where adding two vectors adds the elements componentwise, and multiplying a vector by a constant multiples each component. The long-form version of the first of the above equations would be
f ([x_{0} + y_{0}, …, x_{n−1} + y_{n−1}], k) = f ([x_{0}, …, x_{n−1}], k) + f ([y_{0}, …, y_{n−1}], k) |
Since the result is a linear combination of the vector elements, we can work just with the coefficients and save ourselves some typing. (“Move to the dual space.”) For notational convenience, we will write
⟨a_{0}, … a_{n−1}⟩ = a_{0} x_{0} + ⋯ + a_{n−1} x_{n−1} |
Second observation: The specific value of k is not important. All that matters is whether it is even or odd, and each time we recurse to the next level, the parity flips. So the second parameter is really just a boolean. That greatly reduces the amount of stuff we need to cache, as well as increasing the likelihood of a cache hit. (The naïve version would not have realized that f (x, k) can steal the cached result from f (x, k + 2).)
Our previous hand computation can now be written as
f ([x_{0}, x_{1}, x_{2}], even) | = ⟨½, 0, ½⟩ + f ([x_{0}, x_{1}], odd) / 2 + f ([x_{1}, x_{2}], odd) / 2 |
= ⟨½, 0, ½⟩ + ⟨½, ½, 0⟩ / 2 + ⟨0, ½, ½⟩ / 2 | |
= ⟨½, 0, ½⟩ + ⟨¼, ¼, 0⟩ + ⟨0, ¼, ¼⟩ | |
= ⟨¾, ½, ¾⟩ |
Now that we are working with coefficients, we don’t need to deal with x any more! All that matters is the length of the vector. This makes our recurrences much simpler:
f (2, | k) | = | ⟨½, ½⟩ | for all k. | ||
f (n, | even) | = | ⟨½, 0, …, 0, ½⟩ + | ⟨f (n−1, odd) / 2, 0⟩ + | ⟨0, f (n−1, odd) / 2⟩ | for n ≥ 3. |
f (n, | odd) | = | ⟨f (n−1, even) / 2, 0⟩ + | ⟨0, f (n−1, even) / 2⟩ | for n ≥ 3. |
Now we can carry out a few more hand computations.
f (3, odd) | = ⟨f (2, even) / 2, 0⟩ + ⟨0, f (2, even) / 2⟩ |
= ⟨¼, ¼, 0⟩ + ⟨0, ¼, ¼⟩ | |
= ⟨¼, ½, ¼⟩ | |
f (4, even) | = ⟨½, 0, 0, ½⟩ + ⟨f (3, odd) / 2, 0⟩ + ⟨0, f (3, odd) / 2⟩ |
= ⟨½, 0, 0, ½⟩ + ⟨⅛, ¼, ⅛, 0⟩ + ⟨0, ⅛, ¼, ⅛⟩ | |
= ⟨⅝, ⅜, ⅜, ⅝⟩ | |
f (4, odd) | = ⟨f (3, even) / 2, 0⟩ + ⟨0, f (3, even) / 2⟩ |
= ⟨⅜, ¼, ⅜, 0⟩ + ⟨0, ⅜, ¼, ⅜⟩ | |
= ⟨⅜, ⅝, ⅝, ⅜⟩ |
The interesting thing to observe here is that the recursion does not branch. When we reduce the size of the vector by one element, the recursive calls are basically identical. We have to shift the coefficients differently in order to build up the results, but the recursive call itself is unchanged. This means that we need to perform only n−2 recursive steps to compute f (n, k).
Okay, now that we’ve studied the problem a bit, we can write the code. I’ll write three versions of the function. The first computes according to the recurrence relation as originally written. We use this to verify our calculations.
function f1(x, k) { if (x.length == 2) { return (x[0] + x[1]) / 2; } var term = 0; if (k % 2 == 0) { term = (x[0] + x[x.length - 1]) / 2; } return term + f1(x.slice(0,-1), k + 1) / 2 + f1(x.slice(1), k + 1) / 2; } Immediate window: f1([1,2,3], 0) = 4.0
Okay, that matches our hand calculations, ¾·1 + ½·2 + ¾·3 = 4.
Now let’s do the straight recursive version.
function dotproduct(a, x) { var total = 0; for (var i = 0; i < a.length; i++) total += a[i] * x[i]; return total; } function f2a(n, k) { if (n == 2) return [1/2, 1/2]; var c = new Array(n); for (var i = 0; i < n; i++) c[i] = 0; if (k % 2 == 0) { c[0] = 1/2; c[n-1] = 1/2; } var inner = f2a(n-1, k+1); for (var i = 0; i < n - 1; i++) { c[i] += inner[i] / 2; c[i+1] += inner[i] / 2; } return c; } function f2(x, k) { return dotproduct(f2a(x.length, k), x); } Immediate window: f2([1,2,3], 0) = 4.0
Notice that there is no dynamic programming involved. The hint in the problem statement was a red herring!
Finally, we can eliminate the recursion by iterating
n
up from 2.
function f3(x, k) { var c = new Array(x.length); for (var i = 0; i < x.length; i++) c[i] = 0; c[0] = 1/2; c[1] = 1/2; for (var n = 3; n <= x.length; n++) { var carry = 0; for (var i = 0; i < n; i++) { var nextcarry = c[i]; c[i] = (carry + c[i]) / 2; carry = nextcarry; } if ((k + n + x.length) % 2 == 0) { c[0] += 1/2; c[n-1] += 1/2; } } return dotproduct(c, x); }
I pulled a sneaky trick here in the place we test whether we are in the even or odd case. Officially, the test should be
if ((k + (x.length - n)) % 2 == 0) {
but since we are interested only in whether the result is
even or odd,
we can just add the components together,
because subtraction and addition have the same effect
on even-ness and odd-ness.
(If I really felt like micro-optimizing,
I could fold x.length
into k
.)
Okay, now that we have our code, let’s interpret the original problem.
The expression ⟨f (n, k) / 2, 0⟩ + ⟨0, f (n, k) / 2⟩ takes the vector f (n, k) and averages it against a shifted copy of itself. (The word convolution could be invoked here.) If you think of the coefficients as describing the distribution of a chemical, the expression calculates the effect of diffusion after one time step according to the simple model “At each time step, each molecule has a 50% chance of moving to the left and a 50% chance of moving to the right.” (Since the length of the vector varies with n, I’ll visualize the vector drawn with center-alignment.)
The base case ⟨½, ½⟩ describes the initial conditions of the diffusion, where half of the chemicals are on the left and half are on the right. This is one time step after one unit of the chemical was placed in the center. Let’s get rid of the extra term in the recurrence and focus just on the diffusion aspect:
d(2) = ⟨½, ½⟩ | |
1 1 | entire row divided by 2 |
1 2 1 | entire row divided by 4 |
1 3 3 1 | entire row divided by 8 |
1 4 6 4 1 | entire row divided by 16 |
1 5 10 10 5 1 | entire row divided by 32 |
Yup, it’s Pascal’s Triangle.
Notice that the sum across the row equals the amount we are dividing by, so that the sum of the row is always 1. (This is easy to see from the recurrence relation, since the base case establishes the property that the sum of the coordinates is 1, and the recurrence preserves it.)
This means that we can calculate the coefficients of d(n) for any value of n directly, without having to calculate any of coefficients for smaller values of n. The coefficients are just row n of Pascal’s triangle, which we know how to compute in O(n) time.
Now we can also interpret the extra term we removed at the even steps. That term of the recurrence simulates adding a unit of chemical to the solution at every other time step, half a unit at the left edge and half at the right edge. And we can calculate these directly in the same way that we calculated the diffusion coefficients, since they basically are the diffusion coefficients, just with a time and location adjustment.
I pulled a fast one here. Maybe you didn’t pick up on it: I’m assuming that the two parts of the recurrence unrelated to the diffusion behavior (the base condition and the extra term at the even steps) are independent and can simply be added together. You can show this by noting that given the generalized recurrence
f_{G}(2, k) = G(2, k) |
f_{G}(n, k) = G(n, k) + ⟨f_{G}(n−1, k + 1) / 2, 0⟩ + ⟨0, f_{G}(n−1, k + 1) / 2⟩ for n ≥ 3. |
then f_{G+H} = f_{G} + f_{H}. (As before, induct on the recurrence relations.) Therefore, we can solve each of the pieces separately and just add the results together.
If I had the time and inclination, I could work out the solution for
C(n, i) + Σ_{k even, 2 < k ≤ n} C(k, i) / 2^{k} |
or something similar to that. Like I said, I ran out of time and inclination. But if I could come up with an efficient way to compute that value for all values of i for fixed n (and I believe there is, I’m just too lazy to work it out), then I would have an O(n) solution to the original recurrence.
(Okay, so the “If I had the time” bit is a cop-out, but I sort of lost interest in the problem.)
Should I consider a career change if I fall asleep after reading the first few lines?
@Mungo:
If you were like me and started feeling drowsy, but able to understand what was going on, then don't worry about it. Mathematics does it to me all the time.
If on the other hand you don't understand, then the I would urge you at the very least to learn. While you can get things done as a programmer by throwing code together, this post is all about gaining optimisation opportunities through understanding mathematically what is going on. It is surprising how much mathematics is involved with computing, which is why all of the decent computer science degree courses I know involve discrete mathematics, complexity and automata as modules. Having a pretty decent foundation in other areas is also rather important.
(In the f3() solution)
What do you actually gain by the "sneaky trick" of using (k + n + x.length) instead of (k + (x.length – n))? Is subtraction somehow less efficient than addition?
Also, I would have instinctively removed the need for carry and nextcarry in this section
var carry = 0;
for (var i = 0; i < n; i++) {
var nextcarry = c[i];
c[i] = (carry + c[i]) / 2;
carry = nextcarry;
}
by going through the list in reverse order:
for (var i = n-1; i > 0; i++) {
c[i] = (c[i] + c[i-1]) / 2;
}
c[0] /= 2;
Except that obviously that should be i– in my version of the loop. I always get that wrong the first time.
@Crescens2k
People often say these things, but don't provide concrete examples outside of gaming, data analysis, and specialised scientific/mathematics/development (e.g. writing my own compiler) applications.
The core question is "What relevance is the conceptual example given, specifically, going to have to my software development career? I can only learn so many things, so why should I spend my time on Thing X rather than Thing Y?"
@Anon
I should add that the example may have no relevance at all, and simply be trivia or a brainteaser — in which case, the question is irrelevant.
But for my comment, assume the larger issue of "Why do people ask 'Why do I need to know this?'"
Here's a cute one someone taught me years ago: Simplify the following expression:
(x-a)(x-b)(x-c)…(x-z)
Myria: am I right in thinking you want the answer 0?
@Smithers: "Is subtraction somehow less efficient than addition?"
Actually, yes, sometimes it is. The x86 "lea" instruction is often used to do three-operand addition. The x86 can't normally do three-operand anything, but lea lets you use the specialized addressing modes. However, the x86 doesn't have subtraction in its addressing modes – only addition and optional multiplication by 2, 4, and 8.
@Mark: =^-^=
@Anon:
One example is if you are comparing two lists of strings.
If the lists are unsorted, how long would it take to compare? For both lists being unsorted for every item in one list you would have to start from the beginning of the other. You can stop early if you find the item you are looking for, but you can't stop early in any other case. For the string itself, again, you can compare as far as needed in the string but it is hard to do more than that.
What happens if both lists are sorted? You can start after the start of one list if you have found an item already, you know that any item in the list you are comparing against will always come after the one you found. You are also able to make other optimisations, like only comparing as many letters as needed until you find a difference, even skipping blocks of the array by first letter comparison alone.
While there are many things that be commented on in this example, the fact is that this kind of runtime analysis for even regular functions is in the area of complexity. Of course it is possible to make this kind of reasoning regardless, but learning the mathematics behind it not only formalises the knowledge, but as I often find, you start doing these things implicitly.
This reminds me how few years ago I participated in a contest for young programmers. One of the tasks was to efficiently calculate 1+2+3+…+1978 in as many high-level programming languages as possible. This was my special hobby these days, so I quickly wrote the simple for loop in 20+ languages, ranging from COBOL to Lisp (have I mentioned that the year was 1978?). To my shame, this solution was rated second after the one that simply printed 1957231.
To my understanding, your "iterating up from 2" example is dynamic programming; you are computing the result bottom up with a cache that lets you reuse previous results. This is no different than the first example usually given for dynamic programming the Fibonacci number N; which is O(2^N) for the recursive formulation:
int f(int n)
{
if (n == 0 || n == 1) { return 1; }
return f(n – 2) + f(n – 1);
}
but O(N) for the dynamic programming version:
int f(int n)
{
int nMinusTwo = 1;
int nMinusOne = 1; // This is the "cache" of dynamic programming
for (int idx = 2; idx <= n; ++idx)
{
int currentN = nMinusTwo + nMinusOne;
nMinusTwo = nMinusOne;
nMinusOne = currentN;
}
return nMinusOne;
}
(Your recursive version indeed uses no dynamic programming and that's the point of the article; but people may be confused after seeing the second which does have it)
int f(int n) {
-Raymond]cache[0] = cache[1] = 1;
for (int idx = 2; idx <= n; ++idx) {
cache[idx] = cache[idx - 2] + cache[idx - 1];
}
return cache[n]; }
Wait. You're just numerically solving an initial-boundary value problem for the one-dimensional heat equation, aren't you? Or am I just seeing things at 2 AM?
"To my shame, this solution was rated second after the one that simply printed 1957231."
I would not have rated it second to that, but I would have rated it second to 1978 * (1978 + 1) / 2, assuming that the assignment was in fact "calculate the answer to" and not simply "print the answer to".
@Anon: This question is exactly analogous to "People tell me I should lift weights, but what relevance does this have, specifically, to my career?".
The reason you solve problems like this is that mental exercise builds mental muscles just like physical exercise builds physical muscles.
You're not going to find yourself needing to lift a barbell in (nearly) any career, either. But you will need to do *analogous* things if you have a career where upper-body strength is important. Additionally, it will improve your general physical fitness, which will have beneficial effects on your overall well-being.
Programming is a career where this kind of mental strength is extremely useful, if not actually absolutely vital. And it will improve your general mental fitness, which will have beneficial effects on your overall well-being.
This problem is probably more complicated (mathematically) than most you will encounter in day-to-day programming, this is true. But you want those problems you do encounter to be *easy* for you to solve, not a huge struggle.
Either way, it's just a oneliner:
perl -le "$i += $_ and print $i for 1 .. 1978"
@BOFH:
And now in as may high-level programming languages as possible…
@GregM: you are right, now I remember that the actual prize-winner did exactly as you said, writeln(1978*(1978+1)/2).
I would have thought that merely printing "1957231" should be disqualified for a) the inefficiency in having the programmer calculate the result instead of the computer and/or b) not including the algorithm used to calculate the result as part of the submission. (I would accept a solution that moved the computation from the executable to the compiler or optimiser.)
Finally, one that is correctly rendered using Internet Explorer, but not using Chrome
This article should be re-named "The dangers of buffering up posted messages and then reposting them later".
alegr1: itym articles
Chrome fixed, many thanks from my Android device!
This is still dynamic programming. You are computing values in a table and updating it (as a function of pre-existing values in the table) and arriving at the answer: dynamic programming. You did it bottom-up instead of top-down with memoization, that's all.