Roger Wolff on Implementing ICorDebug for TinyCLR

I talked here about implementing ICorDebug to reuse existing debuggers (such as VS). Roger Wolff did that and wrote about his experience. Here's what he has to say about it:
-----

Hi, my name is Roger Wolff.  I work on the SPOT team.  For the short summary of what we do relevant to this blog: We have built an implementation of the CLI (we call it the TinyCLR) for embedded systems.  Our first product put the TinyCLR to use in wristwatches.  A couple of small tidbits, for those interested: the TinyCLR runs .NET IL bytecode via an interpreter.  We have also built a Visual Studio plugin to enable debugging on either a Win32 emulator, or on device.  As a substantial part of this VS plugin involves implementing ICorDebug API, Mike has asked me to talk a bit about that experience.

So how does our plugin actually fit into VS?  I am no expert on VS extensibility, so I can only hope that I am getting this right.  For those of you out there who know more than me on this, don’t hesitate to let me know where I’m wrong. VS’s main extensibility point is called a package (take a look at HKLM\Software\Microsoft\VisualStudio\8.0\Packages if you are interested).  One such package is the debugger package, implemented by vsdebug.dll.  It’s kind of interesting to me to think that the debugger, arguably the IDE’s most important feature, seems to be seen by devenv as just another package.  Vsdebug.dll abstracts debugging functionality into what I believe they call the AD7 API.  I have no idea what this means, or where it comes from, but I call it the IDebug API, as it defines IDebugThread, IDebugStackFrame, etc..  In the VS sdk, msdbg.idl defines these interfaces.  Vsdebug provides all the UI, and other common functionality to all debuggers in VS, like the call stack window, and attach dialog, just for a couple of examples.  However, vsdebug doesn’t actually communicate with any debuggee process.  Instead, it hosts other VS packages, through a debug engine (DE) which implements the IDebug API set.  For the interested, the list of DEs can be found in the registry at HKLM\Software\Microsoft\VisualStudio\8.0\AD7Metrics\Engine.  There is a native DE for debugging native applications, and a managed DE for debugging managed applications, etc…  The one of interest to us is the managed debugger, cpde.  Cpde, in turn, hosts components that implement the ICorDebug API, which finally actually talk to the debuggee.  For the desktop CLR, this is mscordbi.dll.  In VS 7.0/7.1, cpde was not extensible in this manner, but in Whidbey, cpde will now host any component who will implement the ICorDebug API.  The ICorDebug API is much more friendly in dealing with managed code, as it makes more concrete the abstractions from AD7, defining Classes, Functions, Values, etc…but of course if you’re reading Mike’s wonderful blog, you know all this already. 

 

Enough background, and onto my thoughts/feedback on my experience, especially in regard to the ICorDebug API.  Overall, the API set is pretty straightforward to understand and to implement. Of course, some more documentation wouldn’t have hurt.  I ran into a few problems here and there due to some assumptions that were made but not documented, by either the API itself, or the consumer of the API, in our case, VS.  I can’t think of any great examples off the top of my head, so I’ll settle for the following one instead.  ICorDebugChain::GetStackRange.  In our environment, the managed stack is not a stack at all, but virtualized away to a list of frames on the heap (I still get a kick every time I think about the fact that our stack is actually allocated in the heap).  So what should this function return?  Well, I made up some random addresses, and hoped that things would just work.  Unfortunately for me, I wasn’t really thinking and had my imaginary stack grow upwards in memory instead of down.  Without much documentation, and without any idea of why I couldn’t see the callstack within VS, I didn’t really have much other choice than to start debugging VS.  As has often been the case, I am really glad to have access to VS source code, as well as access to some very nice and very helpful developers within Microsoft. 

 

For the most part, ICorDebug really is platform independent, which was very helpful in our solution.  However, I did run into one exception to this rule.  ICorDebug::CreateProcess pretty much assumes that the debuggee process is running on Windows.  Basically, this API seems to be just a wrapper over the Win32 CreateProcess.  Certain parameters, like lpCurrentDirectory, aren’t applicable to us, and can be ignored.   Worse however, an out parameter of PROCESS_INFORMATION is returned.  This gives Win32 handles to the process and the primary thread of the newly created process.  For our scenario, these handles just don’t exist.  And then when cpde needs to use those handles, well….let’s just say, the solution isn’t pretty.  I understand (ok, I don’t know the details, but I at least grasp the concept) that this is a tricky bootstrapping issue – when launching a process for debugging, you first create the process in a suspended state CREATE_SUSPENDED, so that you can set up entrypoint breakpoints and other such handshaking information.  I also understand that this is the only interface that exists without having some matching state in the debuggee process, so maybe that’s a good enough reason to need to break the illusion of platform independence here.  However, it just doesn’t work very nicely for us.  On one hand, I acknowledge that as a consumer of ICorDebug, you can launch the process separately, and then attach. On that same hand, I also acknowledge that bootstrapping problems can often cause the biggest problems, and can be the hardest to find a clean solution.  However, on the other hand, as a producer of ICorDebug, this just isn’t abstract enough, and depends too much on Windows.  This seems to be handled nicer in the IDebug API, IDebugPortEx2::LaunchSuspended and IDebugPortEx2::ResumeProcess.

 

ICorDebug is not .Net-friendly (not OLE automation compatible).  This was a problem for me, as a major portion of our system (a PC component that did the communication with the TinyCLR) was already written in managed code, and as of about 4 years ago, I have been a huge .Net convert – if there is a choice of managed v. unmanaged code, I am going to choose managed every time.  First things first, in order to cross the managed/unmanaged boundary, you need an interop assembly (or at least managed interface definitions).  In my naiveté, I figured that that is what tlbimp was for (this was way before Mdbg was available, more on that later).  I didn’t actually have a type library, though (although perhaps one does exist).  So, I ran midl over cordebug.idl (with whatever modifications I needed to be actually able to compile) to create a tlb, then ran tlbimp to generate the interop assembly.  For those of you who know a bit about Com/.Net interop, I’m sure you are laughing at me by now.  For those of you who don’t, well, IDL often doesn’t have enough information to handle this interop correctly.  Take the following for example

 

    HRESULT GetName([in] ULONG32 cchName,

                    [out] ULONG32 *pcchName,

                    [out, size_is(cchName),

                    length_is(*pcchName)] WCHAR szName[]);

 

This gets turned by tlbimp into something like this

 

void GetName(uint cchName, ref uint pcchName, char[] szName);

 

This has two problems here.   Some clients (VS, for one), will pass in NULL for pcchName, if they don’t care about the return value.  The interop handler will then crash trying to dereference the NULL pointer.  Also, since szName is not a safearray, and length_is doesn’t get imported, the marshaler will pass in an array of size 1.

 

This can be fixed by something like this.

 

     void GetName([In]uint cchName, [Out]IntPtr pcchName, [Out]IntPtr szName);

 

It’s not very managed code friendly.  It can be made a little nicer with something like

 

     void GetName([In]uint cchName, [Out]IntPtr pcchName, [In, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)]char[] szName);

 

But the point is this.  Tlbimp just doesn’t cut it.  In what I have to believe was a momentary lapse in judgment by the .Net team, they recommend this as the way to customize your interop assembly https://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/whypriinterop.asp. Yes, that’s right. Ildasm + notepad + ilasm.  It’s true, writing IL is much less painful than writing x86 code, but still… I don’t know why tlbimp doesn’t just acknowledge its shortcomings and at least have an option to emit c#, or other higher level language code.   I even tried this approach for a little bit, and fixed a couple of the breaking issues I ran into.  But I quickly realized (though perhaps not as quickly as I should have), that this just wasn’t going to allow me to both implement the debugger, as well as keep my sanity.  I honestly don’t remember how I got rid of the interop assembly, but I think it involved either decompiling the interop assembly, or else a quick processing of the .idl file directly.  In the end, I ended up with c# definitions for the interfaces.  They needed just as much modification as before, but at least they were in a more readable form.  Sanity saved.

 

I looked very briefly at the interop definitions in MDbg.  While I am sure they are fine if you want to consume ICorDebug, I don’t think they seem adequate if you want to implement it. Though I suppose that depends on what debugger you are trying to use.  To clarify, I don’t think you can use those interface definitions if you want to be consumed by VS.

 

Regarding Mike’s blog at https://blogs.msdn.com/jmstall/archive/2005/03/14/395272.aspx, I agree that objects should have clearly defined lifespans.  I am going to take this from a different angle however.  For the most part, the TinyCLR debugger component is stateless.   There were two exceptions to this rule that we ran into.  Breakpoints and values created by func-eval required allocation of resources in the TinyCLR on the debugger’s behalf.  For breakpoints, on deactivate the resource can be cleaned up.  However, the lifetime of ICorDebugValue has nothing but a Release to tell us (the managed, debugger inproc component) when the TinyCLR can free up the resource.  We actually will clean this resource up when execution resumes, but if you stop execution for a long time, and inspect lots of data (and use func-eval explicitly or have property evaluation turned on), we may need to keep allocating memory on behalf of the debugger.  But can’t we just key off of Release, as VS is surely releasing the ICorDebugValue when it’s finished with it, right?  Well, yes, but remember, this component is written in managed code.  The CLR hides the call to Release in the runtime callable wrapper, so basically, we are out of luck.  We can put a destructor on the ICorDebugValue, but even after the RCW releases its reference to our managed ICorDebugValue, the GC won’t clean it up yet.  I suppose it’s just one of the limitations of .Net interop.

 

In summary, implementing ICorDebug was the easy part.  The hard part was dealing with VS.