XP and Systems Programming

In Raymond's blog post today, he mentioned that if you didn't want the GetQueuedCompletionStatus to return when a handle is set to the signaled state, that you can set the bottom bit of an event handle - that would suppress notifications.

The very first comment (from "Aaargh!") was that that was ugly.  And he's right.  Aaargh! suggested that instead a new "suppressCompletionNotifications" parameter be added to the functions that could use this mechanism that would achieve the same goal.

And that got me to thinking about XP and systems programming (XP as in eXtreme Programming, not as in Windows XP) in general.

One of the core tenets of XP is refactoring - whenever you discover that your design isn't working, or if you discover an opportunity for code sharing, refactor the code to achieve that.

So how does this work in practice when doing systems programming.

I'd imagine that the dialog goes something like this:

Program Manager: "Hmm.  We need to add a more scalable queue management solution because we're seeing lots of lock convoy issues in our OS."

Architect: "Let me think about it...  We can do that - we'll add a new kernel synchronization structure that maintains the queue in the kernel, and add an API returns when there's an item put onto the queue.  We'll then let that kernel queue be associated with file handles, so that when I/O completes on the file, the "wait for the item" API returns.  The really cool thing about this idea is that we just need to add a couple of new APIs, and we can hide all the work involved in the kernel so that no application needs to be modified unless it wants to use this new feature."

Program Manager: "Sounds great!  Go for it"

<Time Passes...  The feature gets designed and implemented>

Tester, to Developer: "Hmm.  I was testing this new completion port mechanism you guys added.  I associated a completion port to a serial device, and I noticed that when I associated my file handle, my completion port was being signaled for every one of my calls.  That's really annoying.  I only want it to be signaled when a ReadFile or WriteFile call completes, I don't want it to be called when a call to DeviceIoControl completes, since I'm making the calls to DeviceIoControl out-of-band.  We need a mechanism to fix this."

At this point, we have an interesting issue that shows up. Let's consider what happens when you apply XP as a solution...

Developer, to Architect, sometime later: "Ya know, Tester's got a point.  This is clearly a case that we missed in our design, we need to fix this.  This is clearly an opportunity for refactoring, so we'll simply add a new "suppressCompletionNotifications" to all the APIs that can cause I/O completions to be signalled."

Architect: "Yup, you're right".

Developer goes out and adds the new suppressCompletionNotifications parameter to all the APIs involved.  He changes the API signature for 15 or so different APIs, fixes the build breaks that this caused, rebuilds the system and hands it to the test team.

Tester: "Wait a second.  None of my test applications work any more - you changed the function signature of WriteFile, and now I the existing compiler can't write data from the disk!"

Ok, that was a stupid resolution, and no developer in their right mind would do that, because they know that adding a new parameter to WriteFile would break applications.  But XP says that you refactor when stuff like this happens.  Ok, so maybe you don't refactor the existing APIs.  What about adding new versions of the APIs.  Let's rewind the tape a bit and try again...

Developer goes out and adds the a new variant of all the APIs involved that has a new "suppressCompletionNotifications" parameter to all the APIs involved.  In fact, he's even more clever - he adds a "flags" parameter to the API and defines "suppressCompletionNotifications" as one of the flags (thus future-proofing his change).  He adds 15 or so different APIs, and then he runs into WriteFileEx.  That's a version of WriteFile that adds a completion routine.  Crud.  Now he needs FOUR different variants of WriteFile - two that have the new flag, and two that don't.  But since refactoring is the way to go, he presses on, builds the system and hands it to the tester.

Tester: "Hey, there are FOUR DIFFERENT APIs to write data to a file.  Talk about API bloat, how on earth am I supposed to be able to know which of the four different APIs to call?  Why can't you operating system developers just have one simple way of writing a byte to the disk?"

Tester (muttering under his breath): "Idiots".

Now let's rewind back to the starting point and reconsider the original problem.

Developer, to Architect, sometime later: "Ya know, Tester's got a point.  This is clearly a case that we missed in our design, we need to fix this.  I wonder if there's some way, that we could encode this desired behavior without changing any of our API signatures"

Architect: "Hmm.  Due to the internal design of our handle manager, the low two bits of a handle are never set.  I wonder if we could somehow leverage these bits and encode the fact that you don't want the completion port to be fired in one of those bits..."

Developer: "Hmm. That could work, let's try it."

And that's how design decisions like this one get made - the alternative to exploiting the low bit of a handle is worse than exploiting the bit.

And it also points out another issue with XP: Refactoring isn't compatible with public interfaces - once you've shipped, your interfaces are immutable.  If you decide you need to refactor, you need a new interface, and you must continue to support your existing interfaces, otherwise you break clients.

And when you're the OS, you can't afford to break clients.

Refactoring can be good as an internal discipline, but once you've shipped, your interfaces are frozen.