The importance of fortifying your subsystems.

In Writing Solid Code, Steve Maguire warns to "fortify your subsystems". This is especially important if your subsystem takes liabilities on other systems.

One of our (CLR debugging teams') most common bugs from Whidbey Beta 2 is a case where we hadn't done this.  (For the record, the bug is fixed in the final version of Whidbey) In particular, the CLR debugging services (ICorDebug) expect the CLR loader to notify it of all modules that get loaded. If the loader (due to some bug) didn't notify the debugging services of a module, then ICorDebug gets horribly confused if it later sees code from that module on the stack. 
One of the things that would make fortifying this difficult is that you don't realize you hit this problem until it's too late, and then there's not a good fallback behavior. In other words, there's not a good boundary to fortify at. Furthermore, various naive suggestions for what to do when we detect the problem (such as just fabricating a module when you first see it; ignoring frames from unknown modules; etc) all have other problems and really just push the bug around. 

The effect of this bug is that you'll get a crash in the debugger process that looks something like this (the frames past the red frames may look different):

00 mscordbi!CordbHashTable::UnsafeGetBase
01 mscordbi!CordbModule::LookupOrCreateFunction
02 mscordbi!CordbFunction::LookupOrCreateFromFuncData
03 mscordbi!CordbThread::RefreshStack
04 mscordbi!CordbThread::FindFrame
05 mscordbi!CordbProcess::DispatchRCEvent
06 mscordbi!CordbRCEventThread::FlushQueuedEvents
07 mscordbi!CordbRCEventThread::HandleRCEvent
08 mscordbi!CordbRCEventThread::ThreadProc
09 mscordbi!CordbRCEventThread::ThreadProc
0a kernel32!BaseThreadStart

We got the relevant parties together to analyze how we got into this trouble. It basically happened when you attached a debugger when a module was in the middle of loading. We wanted to make sure this was just a one-off bug and not some architectural flaw.
The good news is we've since fixed all the issues here with the final Whidbey release. We also came up with an aggressive testing mechanism for this.

Comments (1)

  1. Here’s an sampling of various goofy bugs we’ve had to deal with in the CLR Debugging services over the…

Skip to main content