Why interop debugging is difficult

The Visual Studio debugger supports debugging both .NET code and native code at the same time. We call this ‘interop’. At first glance, this might not seem like much of an accomplishment. After all, we support debugging .NET code well enough, and we support debugging native code. What’s so hard about doing both at the same time?

The problem is this – the way we debug .NET code is that the CLR has two pieces of their debugger. They have one piece that sits inside of the debuggee process. This piece understands all the nitty-gritty details of the CLR. Then there is another part of the CLR that runs inside the debugger process. This code knows how to talk to the code inside the debuggee, and exposes an interface that Visual Studio consumes to debug .NET code (ICorDebug).

Now the way the Win32 native debugging API works is that the OS tells the debugger when an event happens. After the OS tells the debugger about the event, every thread in the debuggee is frozen. Normally, all of these threads stay frozen until the debugger is done processing the event, or until the user decided to continue the process (depending on the kind of event).

This causes a big problem. In order to do .NET debugging, we need to run code in the process. But native debugger can’t let code run in the process. This has been a very hard problem to solve. We partially work by have one thread inside the debuggee continued. The difficulty of this problem is why we don’t always just go and enable interop debugging, and why interop debugging is generally slower and less reliable.

Someday, we would like to make this easier by having the CLR examine data structures from the debugger process. We call this a ‘hard mode’ debugger because the debuggee is hard stopped. However, there will still be tricky issues to deal with. One problem is that the current .NET debugger can’t stop the process on any arbitrary instruction. Instead, it tries to get the debuggee into a ‘safe place’. This has the downside that sometimes we can’t attach to a process. But on the upside:

  • Evaluating properties (and other functions) is safer because it is less likely that another thread will have an important lock (say the heap lock as a good example).
  • We don’t need to worry about debugging while the garbage collector is moving references around.