My manager said to me last week, “The most important thing the c# compiler does, is not compiling the code, it is generating good error messages.” Helping you write bug free code is what we do. I was thinking about this yesterday as I sat in a design meeting for the next version of the language. We were proposing new constructs that would simplify some scenarios, though the drawback of some of these improvements could make it harder for you to catch a bug in some cases, so there was a hesitation, a lingering uncertainty. It made me start thinking about some of the biggest improvements I had experienced over the years that lead to less buggy code in the products I worked on.
Years ago, the most common bug I recall was a memory overwrite. These were nasty buggers, most often caused by bad pointer arithmetic, or off-by-one array-out-of-bounds type stuff. The most common of these were in code manipulating character arrays, or ‘strings’. Then came c++, and the world did not change much. Then came the standard framework library, and with it a string class, and the world was never the same again. String based bugs went away. Half of all bugs generated dropped of the face of the planet. Life was good.
Then there was this thing called COM and OLE that came out. Every app wanted to integrated into every other app, and there were these great new interfaces that let you do it. The only drawback was that the price of admission included working with a subtle programming pattern called reference counting, and believe me in programming subtle things are way bad. Whatever we had just saved by going to a standard string class, like MFC’s CString, we lost in droves to bugs generated by bad reference counts.
Let me explain how bad it was. On the surface, reference counting looks real simple. If you want to guard an object from being reclaimed, increment its reference count. It will only be deleted if the count goes back to zero. So, other objects in your system of data structures can be ‘holding’ on to other objects. The most notorious problem with this is the reference loop. If object A holds onto object B and B holds onto A, then even if nothing else in the system holds onto either of these objects, neither will ever go away. Oops. Instant memory leak, or worse, a system resource represented by the object will never get reclaimed. Now, there are a bunch of advanced tricks to help you out of this bind, but all of a sudden the simple concept of reference counting becomes very complex to understand, and that my friends is too subtle. When you start to realize this, you begin to feel the impending doom.
But that’s the easy stuff. It was actually much worse than that. My example is ActiveX Data Objects (ADO) that I helped build back in the mid 90’s. ADO was forever on the bleeding edge of programming techniques. It was one of the first COM interfaces + IDispatch components built, and so one of the first Microsoft products to run into the reference counting nightmare. To solve the looping problem ADO had two kinds of reference counting. Only one was observable, but internally there were two kinds, and both had to drop to zero before the object would fire its destructor and shut down. The problem started getting really difficult when you realized that there was more than just deallocation of memory that was going on at this point, more than just the destructor needed to fire. Shutdown code had to fire. Database connections had to close. Threads had to abort. So there were a lot of complicated messages being sent around to a lot of underlying components, which were all implemented using OLEDB interfaces, which were all based on COM and all used reference counting.
When writing code against COM objects there were tricks that you could employ. If you got one of these objects as a parameter you would want to protect it from getting released too soon, so you would up front call AddRef() to increment the count. Now your bit of code had a hold on the object so it would not go away for the duration, no matter what a sub-function might do, and especially no matter what another thread might do. But of course, if code like this got executed during shutdown, the reference count would tick back up at the start of your code and then tick back down at the end, causing the reference to drop back down to zero again, causing the shutdown code to re-execute and the destructor to re fire, causing your codebase to either go into an infinite tailspin or cause a low-level exception that would abort the process: exit(0)!
Of course I fixed it. I could have re-written tens of thousands of lines of code, making special versions that were ‘just for shutdown’. But this wouldn’t be able to cross well defined public interface boundaries between our own objects. So I just did a hack. I added a flag. To this day, every component in the ADO family has a flag that controls reference counting. Its named “ImpendingDoom“. If the object is in the state of impending doom, it knows the destructor has already been fired, and so it is protected from being called again.
The introduction of the managed runtime fixed all this. Garbage collection trumps reference counting any day. All the bugs introduced by reference counting just went away, and after that I no longer felt the impending doom.
But I digress