Yesterday, I wrote about a trick to reduce the number of bits in a number by one.
It turns out that I've only ever had one opportunity to use this trick (although I ran into an instance of it when code reviewing some stuff the other day), back when I was writing the NT 3.1 network filesystem.
One of the things that makes writing a network client so hard is that you need to map legacy versions of your network protocol to the semantics that your customers require. In this case, I needed to map between the "how much disk space is free on the drive" SMB request and the NT filesystem query that retrieves the amount of disk space on the drive.
It turns out that the legacy servers returned this information in terms of bytes per sector, sectors per cluster and clusters per disk (because that was how MS-DOS maintained that information).
NT maintained the information in a different form, and the conversion between the two forms required that I divide the interim results by bytes per sector.
Well, if the bytes per sector was a power of two (which it almost always was), I could do the division trivially (by shifting the interim result right by the base2 logarithm of the sector size).
Unfortunately, if the bytes per sector was NOT a power of two, I was stuck with a division. Since the interim value was a 64bit number, that meant that I had to do a 64bit by 32bit division. At the time I was writing the code, the compilers didn't have support for 64bit arithmetic, for 64bit math, we had a number of internal "RTL" functions that would do the math. We had functions for addition, subtraction and multiplication, but nobody had ever written a version that did division.
I'd never done numeric processing before, but I needed the function, so... I went to Dave Cutler (since he knows everything) and asked him how to do the math, he told me and I trotted off to implement his algorithm.
Since the code was only used in a special case that was likely never to occur outside of the test labs, I didn't particularly spend a lot of time optimizing the function - I faithfully coded the algorithm in C - two versions, one which did a 64x64 division, another that did a 64x32 division. I could have written an assembly language version of the 64x32 version (since the processor supported 64x32 division natively), but we were REALLY concerned about portability and since it wasn't going to be called that often, I went with the more portable solution (and I'd been told that if I ever wrote stuff in assembly language, I'd have my thumbs slowly roasted over a red hot hibachi).
So I wrote the API, wrote enough unit tests to confirm to myself that it worked, checked it in as a new RTL API and went away satisfied that I'd done my first piece of numerical processing. I was SO happy 🙂
At this point, anyone reading this post who was on the NT team at the time is starting to see where this is going.
About three years later (long after I'd left NT and moved onto Exchange), I heard a story about something that happened some point after I'd left the team. You see, Dave spent a fair amount of time during the NT 3.5 to NT 4 period doing analysis of the bottlenecks in the system and looking for ways of making the system faster.
At some point, his investigations brought him to (you guessed it) my little division functions. It turns out that one of the core components in the system (I think it was GDI, but I'm not sure) decided that they needed to divide a 64bit numbers by a 32bit number.
Because the compiler didn't have native support for 64bit arithmetic, they went looking through the same RTL functions I did, and they discovered my 64x32 division routine. So they started using the routine.
In their innermost loop.
All of a sudden, the silly routine I'd written to solve a problem that would never be seen in real life all of a sudden became a part of the inner loop of a performance critical function inside the OS.
When Cutler discovered this bottleneck, he looked at the code in my little routine and he went through the roof. I understand he went through the roof and started cursing my name repeatedly before he rewrote it in assembly language.
So the moral of the story is: When you're writing code that's going into a system library, make darned sure that it's written to be as performant as humanly possible, because you never know if someone's going to find your one-off piece of code and put it in their innermost loop.