Digging way back into my pre-Microsoft days, I was recently reminded of a story that I believe was told to me by Mary Shaw back when I took her Computer Optimization class at Carnegie-Mellon…
During the class, Mary told an anecdote about a developer “Sue” who found a bug in another developer’s “Joe” code that “Joe” introduced with a performance optimization. When “Sue” pointed the bug out to “Joe”, his response was “Oops, but it’s WAY faster with the bug”. “Sue” exploded “If it doesn’t have to be correct, I can calculate the result in 0 time!” .
Immediately after telling this anecdote, she discussed a contest that the CS faculty held for the graduate students every year. Each year the CS faculty posed a problem to the graduate students with a prize awarded to the grad student who came up with the most efficient (fastest) solution to the problem. She then assigned the exact same problem to us:
“Given a copy of the “Declaration of Independence”, calculate the 10 most common words in the document”
We all went off and built programs to parse the words in the document, inserting them into a tree (tracking usage) and read off the 10 most frequent words. The next assignment was “Now make it fast – the 5 fastest apps get an ‘A’, the next 5 get a ‘B’, etc.”
So everyone in the class (except me :)) went out and rewrote their apps to use a hash table so that their insertion time was constant and then they optimized the heck out of their hash tables.
After our class had our turn, Mary shared the results of what happened when the CS grad students were presented with the exact same problem.
Most of them basically did what most of the students in my class did – built hash tables and tweaked them. But a couple of results stood out.
- The first one simply hard coded the 10 most common words in their app and printed them out. This was disqualified because it was perceived as breaking the rules.
- The next one was quite clever. The grad student in question realized that they could write the program much faster if they wrote it in assembly language. But the rules of the contest required that they use Pascal for the program. So the grad student essentially created an array on the stack and introduced a buffer overflow and he loaded his assembly language program into the buffer and used that as a way of getting his assembly language version of the program to run. IIRC he wasn’t disqualified but he didn’t win because he circumvented the rules (I’m not sure, it’s been more than a quarter century since Mary told the class this story).
- The winning entry was even more clever. He realized that he didn’t actually need to track all the words in the document. Instead he decided to track only some of the words in the document in a fixed array. His logic was that each of the 10 most frequent words were likely to appear in the first <n> words in the document so all he needed to do was to figure out what "”n” is and he’d be golden.
So the moral of the story is “Yes, if it doesn’t have to be correct, you can calculate the response in 0 time. But sometimes it’s ok to guess and if you guess right, you can get a huge performance benefit from the result”.
 This anecdote might also come from Jon L. Bentley’s “Writing Efficient Programs”, I’ll be honest and say that I don’t remember where I heard it (but it makes a great introduction to the subsequent story).
 I was stubborn and decided to take my binary tree program and make it as efficient as possible but keep the basic structure of the solution (for example, instead of comparing strings, I calculated a hash for the string and compared the hashes to determine if strings matched). I don’t remember if I was in the top 5 but I was certainly in the top 10. I do know that my program beat out most of the hash table based solutions.