Gosh, programming is hard!

Article
02/10/2004

I've been programming since I was a wee tike of 9. Now, 28 years later, I'm amazed because while I have grown proficient in a lot of the skills of software design and engineering, it's nonetheless been proven too many times to count that it's unbelievably hard to write correct programs.

This is near and dear to my heart since I work on Windows. I know the callous characterizations of Windows and Microsoft in general as “good enough” and being profit rather than engineering excellence but I like to believe myself to be part of the engineering excellence cadre. The scale and importance of Windows quality really drives this home in terms of costs, perception and more recently security ramifications. (Note that engineering is all about economics, not just theoretical modelling. Having the $$$ to continue to pay for your ongoing engineering costs is a factor that you may ignore at your own peril. I will not comment at all on the obvious ancilliary topic of liability for hopefully obvious reasons.)

My point I wanted to make tonight is that it's really absolutely mind-numbingly hard to get everything right. It makes me question if we're just doing something fundamentally wrong in how we build automata like our programs today. (I'm a mathematician by training, not a computer scientist so while I can talk a good game about complexity theory and compare and contrast imperative vs. functional languages, I can't mentally live in the everythings-a-bignum world that people who really entrench themselves in this part of programming theory/philosophy live in.)

Let's just take a look at some bugs that are all over everyone's code base. Here's a great example:

i = i + 1;

wow, that was rocket science! But wait, it has a bug! (Language lawyers who want to quote how most language designs leave the details of overflow situations to the implementation need not apply. They'll want to say something like “there is no bug there”. auto-bignum support also need not apply because I don't work on the kinds of components and applications which can afford to make statements like “oh, we'll just do a heap allocation for any number over 255”. Languages which turn on checked arithmetic get one silver star here but (a) the side effects of the checking probably introduce worse effects into the overall system correctness than the overflow and (b) even in C# you have to ask for this to be turned on.)

Don't dismiss this example until you consider that such an overflow can easily lead to invalid global invariants. It's easy to pick on buffer overflows here since they get the attention in the press, but at some point global invariant failure will lead to a large number of interesting exploits. (For example, imagine that the programmer made such a horrible error as using a cardinal type whose precision is less than the number of object instances that can be simultaneously constructed. Even just getting people to switch to size_t/SIZE_T isn't trivial and it just takes one.)

The most distressing thing here is that it did take 20+ years for the fact that this is a serious coding bug that can lead to viri and worms propagating around the world. Maybe I'm just dumber than your average person but I worry that everyone thinks that programming is getting easier with all the whiz-bang tools and techniques coming together. From my perspective I'm just finally now understanding how truly and deeply hard it is. If I could send a message to myself 5 years ago with what I've learned from working on Windows, I would have thought that the message was obviously a fake because of how absurd the issues sound on the face of it.

I don't know what I'm really going to do with this blog; I hope that somehow I can get it as useful as Raymond Chen's blog but he's Useful; I'm just Relatively Effective. :-) I hope this is a good start; I've been looking for a forum for “programming deconstructionism“ as I call it. Maybe we'll all learn something; I know I still learn a lot that truly expands my mental models every day. Once enough deconstruction has occurred; I'll enter into the re-constructionism period and try to navigate some safe passage between Scylla and Charybdis.

Oh, and I work on Fusion, the component / application composition and versioning model. A lot of the reason that “dll hell“ got to be the problem that it is is because implementation errors forced so much backwards compatibility and forking of source code. Our quality bar is extremely high because even now as we evolve from XP to CLR 1.0/1.1 to Everett to Whidbey to Longhorn, every bug in our base implementations causes us untold grief. This isn't anything unusual but when the infrastructure has a break, we end up with terrible decisions for customers like people having to write programs that work on one platform but possibly not all of them.

Intellectual exercise for people who know windows - if we know that oleaut32 is the code that registers type libraries, what are we supposed to do when oleaut32 versions? One answer is nothing - anyone could have written those registry keys, oleaut32 was just helping the caller out and it's their responsibility to make sure they're right. One answer that is theoretically satisfying is to unregister everything with the old oleaut32 and then reregister it with the new one. But that's also not necessarily good - maybe the bug in the old oleaut32 was that it didn't unregister correctly; maybe you need to new oleaut32 to correctly unregister. And if the fix is just a missing status code check in some unrelated code, why did we take the time to cycle the type library registrations just to apply a trivial security fix?

Next topic: exceptions - safe to throw as long as nobody ever catches them.

Gosh, programming is hard!

Additional resources