"Rocket Science" API?

Let me just say it and make sure we're all on the same page here: We made a lot of mistakes in v1.0 of ICorDebug.

One such mistake was we frequently called ICorDebug a "Rocket Science" API. IMO, we figured that anybody writing a debugger must be a super genius and then we used that to justifying cutting all sorts of corners when designing + implementing the API. For example:

  • The implementation included very few runtime checks. Thus it's extremely easy to "misuse" our API and not realize it. This was especially bad because ICorDebug is so large (~250 methods in v1.0) and complicated, and it's very poorly documented. There's no practical way somebody is not going to misuse it.
  • It's a COM-interface, but not very COM-conforming. We barely got the addref/release right, let alone the QIs and other com-stuff (like correlating return values with out-parameters, apartment state, enumerators, etc). We actually started finding a ton of these bugs when we first started switching our testing over to MDbg (because it uses COM-interop on top of ICorDebug).
  • There was less pressure to make sure that the API was really cohesive and well thought out. We figured that debuggers could just add extra logic to compensate for shortcomings in our API. For example, properly implementing detaching with ICorDebug is actually very complicated. It's much more than just a call to ICorDebugProcess::Detach (see Cordebug.idl).
  • Very little enforcement to prevent the debugger from breaking CLR Execution-engine invariants. For example, the EE requires AppDomain isolation (an object graph must live in a single appdomain). The debugger could use SetValue and break this, and then the EE would eventually AV.

Now in our defense, we had good excuses at the time for all our decisions. "The CLR gets better performance this way", "we're just a C++ header trapped in a COM idl file", "debugging is innately complicated", "nobody would ever do that", "only a handful of people in the world will actually use this API", etc...  But hindsight's 20/20. I think where we first started realizing this was when bugs came and the support cost of determining whether a bug was a real CLR bug or a debugger "misuse" bug became ridiculously high. It became cheaper to just fortify our API. I'm thrilled at some of the stuff we do in v2.0. Another perk of fortifying the API was that it did add an extra level of protection in case there was a bug in the debugger. It could downgrade a crash into a graceful failure, and that's good for end-users.

For the record, I still think debugging is an innately complicated problem and our API can't possibly shield a debugger-writer from those complexities. But I'm happy to say that I think we've made great progress on this in v2.0. The moral of the story is if you're making a "rocket science" API, be prepared for the consequences, both good and bad.