Object Identity in Managed Debugging


The problem:

Perhaps you’ve navigated through a global, local, or parameter while debugging, and then through some ugly series of object references (such as a hash table) to find an object reference. You want to be able to get some identity on that object so that you can easily find it again. Refinding it later may not be practical, especially if the program state has changed since last time such that the original steps no longer yield the same object. This could easily happen if a data structure gets changed, a variable gets reassigned, or a function returns and invalidates all its locals. In fact, in such cases, retracing the original steps may even yield a totally different object.

 

The solution is to provide “object identity” that exists independent of how it was found. Then the object can always be retrieved later based off its identity, regardless of changes in program state.  Object identity for managed code is new in v2.0.

 

Object Identity in Native Code

In native code, every object has an intrinsic identity via the ‘this’ pointer.  Since there’s no GC moving objects around in native code, the object’s address is constant and can be used to refer to the object as long as it is alive.

Eg, if I know there’s an object of type Foo at address 0x0012eeff, I can view it at any time by inspecting “((Foo*) 0x0012eeff)”. The address provides a very convenient innate object identity.

 

The problem in managed code.

Since managed code has a GC that moves objects around, casting the address is not necessarily safe. The address may be 0x0012eeff, and then a GC may move it to 0x44556677.  Any object identity solution for managed code will need to cooperate with the garbage collector.

 

Sample of Object identity in managed code.

Here’s a simple example to see object identity this in VS 2005:

class Program

{

    static void Main()

    {

        string s = “A test”;

        Foo();

    }

    static void Foo()

    {

        object x = null;

        System.Console.WriteLine(“In Foo:” + x); // <– set Breakpoint here

    }

}

 

Try this out in Visual Studio 2005 (works in beta 1):

1)      Run to the breakpoint.

2)      At the breakpoint, switch the current stack frame to Main().

3)      Open the Locals window. You should see the local for ‘s’. It’s value is “A test”.

4)      Right click ‘s’ in the locals window and click on “Make Object ID” on the context window. The value now shows “A test” {1#}. VS has created a pseudo variable ‘1#’ and aliases it to ‘s’.

5)      Now switch back to Foo().

6)      In the locals window, you’ll see ‘x’, with a value of null.

7)      Set the value of ‘x’ to ‘1#’.

8)      The value of x is now “A test” {1#}!

9)      Step over the writeline and see that it actually uses this new value of x.

 

 

How it works from ICorDebug:

We’ve added a new API to ICorDebug:

interface ICorDebugHeapValue2 : IUnknown

{

 

    /*

      * Creates a handle of the given type for this heap value.

      *

      */

    HRESULT CreateHandle([in] CorDebugHandleType type, [out] ICorDebugHandleValue ** ppHandle);

 

};

 

This returns an ICorDebugHandleValue which derives from ICorDebugReferenceValue. HandleValue tracks a GC handle for the object being inspected. (In retrospect, I think we should have called this ICorDebugStrongReferenceValue instead of HandleValue. HandleValue implies that it corresponds to a System.Runtime.InteropServices.GCHandle structure, which it does not.).

 

The debugger can keep a map of pseudo variables (like ‘1#’) to ICorDebugHandleValue objects.

 

‘type’ specifies whether the handle is ‘strong’ or ‘weak’. ‘Strong’ (aka ‘normal’) handles keep an object alive whereas ‘weak’ handles don’t. One side effect of this is that the debugger can now create strong handles to alter the lifespan of objects, which allows a debugger to alter program behavior. (Thus aggravating the issue of bugs only reproing under a debugger). Since GCs and lifespans are already non-deterministic, this perturbance is similar to the case of a debugger altering timing and exposing a race condition.

 

V1.1 attempted to solve this at the ICorDebug level (via ICorDebugReferenceValue::DereferenceStrong), but it was a broken design. We’ve deprecated DereferenceStrong and are prefering ICorDebugHandleValue instead.

 

 

Comments (13)

  1. You were careful to point out that ICorDebugHandleValue does not correspond to a GCHandle. What exactly is one of these handle values, then? And how is this scheme different from keeping a map from pseudo variables to GCHandles created by GCHandle.Alloc?

    Also, does a ‘weak’ handle value work like a Weak GCHandle, or a WeakTrackResurrection?

  2. Matthias says:

    Cool. I had been wishing for something like that. Actually, I would prefer giving a nice name myself to it, instead of "#1".

  3. Mike Stall says:

    Nicholas –

    From the debugger’s perspective, System.Runtime.InteropServices.GCHandle is "just" data. The debugger doesn’t try to assign any meaning to that data.

    Now both the managed GCHandle class and the ICorDebugHandleValue object have an underlying gc reference. The GCHandle class exposes it for managed code; the ICorDebugHandleValue exposes it for the debugger.

    It’s true that the debugger could get similar functionality to ICorDebugHandleValue via func-evals + GCHandles. However, func-eval is evil, so we want to provide an alternative, more-correct, solution.

    weak="track resurection".

  4. Mike Stall says:

    Matthias – Note the ‘1#’ is purely an application specific restriction. The debugger could have just as easily supported mapping arbitrary names to values.

    For example, MDbg (at least in Beta2) lets you map arbitrary names to values like:

    set $my_value=array[5].x

  5. I hear IronPython is a great managed scripting language to embed in other managed apps, so I thought…

  6. I hear IronPython is a great managed scripting language to embed in other managed apps, so I thought…

  7. The ICorDebug API (the API for debugging managed apps) is about 70 total interfaces.&amp;nbsp; Here is how…

  8. meneame.net says:

    The problem:

    Perhaps you’ve navigated through a global, local, or parameter while debugging, and then through some ugly series of object references (such as a hash table) to find an object reference. You want to be able to get some identity on that

  9. I’ve mentioned func-eval (aka property eval) is evil for end-users; but it’s also evil if you want to…

  10. There are multiple reasons to cache data. For example, are you caching because of a performance issue