When I first started making games, most things were written in C, with critical pieces optimized in assembly language. A skilled assembly programmer could beat the C compilers of the day by a factor of two or more, so this was an important optimization (for instance the blit routines in Allegro were all written in assembly).
But assembly language could be a dangerous pitfall. Rewriting a function in assembly language changed the details of the way it worked (the how, or implementation), but did not fundamentally alter its purpose (the what, or algorithm). Doing the same basic thing in a more efficient way could give a 2x, 3x, or even 4x speed boost, but larger gains could only be achieved by switching to a smarter algorithm.
The danger of assembly language was that it made code faster, but also more complex and harder to understand. By obscuring the underlying algorithm, this 2x optimization often hid the potential for much more significant algorithmic improvements.
I was reminded of this by a recent thread on the creators.xna.com forums, where a word search algorithm was taking more than half a second. Early suggestions focused on optimizing the implementation:
- Use StringBuilder to reduce garbage
- Pack short text strings into 64 bit integers to speed comparison
- Use threads to parallelize the work across multiple CPU cores
These techniques are the C# equivalent of rewriting something in assembly language. They did indeed work, reducing the half second search time to 80 milliseconds (a 6x gain), but there is a fairly low upper bound on how much such things can help. For instance on a quad core machine, threaded code can run at most 4x faster than a single core version (in practice the gain is usually far less).
Then Jon Watte suggested a better algorithm, using a tree structure to replace the original O(N!) (factorial) algorithm with a log(N) version.
Result? The search now completes in less than 1 millisecond. That's more than a 500x improvement!