Figure out variable lifetime using SOS

There is a bug in this program below, try to see if you could catch it.

Test.cs (compiled to DelegateExample.exe):

 

using System;
using System.Threading;
using System.Runtime.InteropServices;

class Test
{
delegate uint ThreadProc (IntPtr arg);

private uint m;

public Test (uint n)
{
m = n;
}

uint Reflect (IntPtr arg)
{
Console.WriteLine (m);
return m;
}

static void Main ()
{
Test t = new Test (1);
ThreadProc tp = new ThreadProc (t.Reflect);
NewThread (tp);
Thread.Sleep (1000);
}

[DllImport("UsingCallback")]
static extern void NewThread (ThreadProc proc);
}

UsingCallback.cpp (compiled to UsingCallback.dll):

_stdcall void NewThread (LPTHREAD_START_ROUTINE cb)
{
DWORD id = 0;
CreateThread (NULL, 0, cb, NULL, 0,&id);
}

Yes, here is the problem: in the cs file, the managed code passes a Delegate object to unmanaged code which will create a new thread to call the delegate. Since unmanaged code has no way to tell CLR how it plans to use the object, from CLR's point of view, there are no live roots for the Delegate object after the line “NewThread (tp); ”. Thus the object is eligible to be garbage collected (GC) after the call, even if the new thread might not start yet. So it's possible for the Delegate to become trash before unmanaged code invokes it and cause unspecified failure. A fix is to add GC.KeepAlive before Main returns:

 

         Test t = new Test (1);
ThreadProc tp = new ThreadProc (t.Reflect);
NewThread (tp);
Thread.Sleep (1000);
GC.KeepAlive (tp);

One thing annoying about this kind of bug is that GC is nondeterministic, the program could work just fine 99% of time, but only crashes under stress situation. Another thing make the problem hard to find is that it's not very intuitive by looking at the source. You might know the theory that when a variable is not used anymore, it will not be reported as a live root; but it would take quite some time to figure out at which point which variable is dead, plus there's no way to verify if CLR agrees with your analysis.

I'll show you how to use SOS.dll to check the internal data structure used by CLR to determine variable lifetime. When JIT compiles code, it generates such variable aliveness information (GC info) for each method and saves it along with the machine code of the method. When GC happens, it will check GC info for every method in the stack to find out which variable is alive and will use the live variables as object roots. GC info is highly compacted, but SOS.dll has “!GCInfo” command to crack it and show it in a human-readable way. This approach only works in assembly level, so stop reading if you are not interested in assembly language. :)

I compiled test.cs above using Visual studio to a "Debug" build, and launched the program under Windbg. After the Main method is JITted, I could disassemly the generated native code of the method using "!SOS.u" command:

 

0:000> !u 02f00058
Will print '>>> ' at address: 02f00058
Normal JIT generated code
[DEFAULT] Void Test.Main()
Begin 02f00058, size 60
>>> 02f00058 55               push    ebp
02f00059 8bec             mov     ebp,esp
02f0005b 83ec08           sub     esp,0x8
02f0005e 57               push    edi
02f0005f 56               push    esi
02f00060 53               push    ebx
02f00061 33ff             xor     edi,edi
02f00063 33db             xor     ebx,ebx
02f00065 b9e850ad00       mov     ecx,0xad50e8 (MT: Test)
02f0006a e8a91fbcfd       call    00ac2018 (JitHelp: nc)
02f0006f 8bf0             mov     esi,eax
02f00071 8bce             mov     ecx,esi
02f00073 ba01000000       mov     edx,0x1
02f00078 ff152051ad00     call    dword ptr [00ad5120] (Test..ctor)
02f0007e 8bfe             mov     edi,esi
02f00080 b9c451ad00       mov     ecx,0xad51c4 (MT: Test/ThreadProc)
02f00085 e88e1fbcfd       call    00ac2018 (JitHelp: nc)
02f0008a 8bf0             mov     esi,eax
02f0008c 689350ad00       push    0xad5093
02f00091 8bd7             mov     edx,edi
02f00093 8bce             mov     ecx,esi
02f00095 ff152452ad00     call    dword ptr [00ad5224] (Test/ThreadProc..ctor)
02f0009b 8bde             mov     ebx,esi
02f0009d 8bcb             mov     ecx,ebx
02f0009f ff152c51ad00     call    dword ptr [00ad512c] (Test.NewThread)
02f000a5 b9e8030000       mov     ecx,0x3e8
02f000aa ff155084bb79  call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02f000b0 90               nop
02f000b1 5b               pop     ebx
02f000b2 5e               pop     esi
02f000b3 5f               pop     edi
02f000b4 8be5             mov     esp,ebp
02f000b6 5d               pop     ebp
02f000b7 c3               ret

 

SOS's "!u" is similar to Windbg's "u", but it shows more data because managed code is self-describable. For example, for those indirect calls, if the target is a managed function, "!u" could tell us the function name.

 

To check its GC info, we need to get the method's MethodDesc first (Method descriptor, CLR's data structure to keep all information about one method). We could use "!ip2md" to find out the method desc from any instruction pointer in the method:

 

0:000> !ip2md 02f00058
MethodDesc: 0x00ad50a8
Jitted by normal JIT
Method Name : [DEFAULT] Void Test.Main()
MethodTable ad50e8
Module: 151ad0
mdToken: 06000003 (D:\projects\DelegateExample\bin\Debug\DelegateExample.exe)
Flags : 10
Method VA : 02f00058

This command shows some important information about the method. To check GC info, we only need to pass the MethodDesc pointer itself to “!GCInfo”:

 

0:000> !gcinfo 0x00ad50a8
Normal JIT generated code
Method info block:
    method      size   = 0060
    prolog      size   =   9
    epilog      size   =   7
    epilog     count   =   1
    epilog      end    = yes
    saved reg.  mask   = 000F
    ebp frame          = yes
    fully interruptible=yes
    double align       = no
    security check     = no
    exception handlers = no
    local alloc        = no
    edit & continue    = yes
    varargs            = no
    argument   count   =   0
    stack frame size   =   2
    untracked count    =   0
    var ptr tab count  =   0
    epilog        at   0059
60 E5 C0 45    |
 

Pointer table:
F0 7B          | 000B        reg EDI becoming live
5A             | 000D        reg EBX becoming live
F0 42          | 0017        reg EAX becoming live
72             | 0019        reg ESI becoming live
4A             | 001B        reg ECX becoming live
F0 03          | 0026        reg EAX becoming dead
08             | 0026        reg ECX becoming dead
F0 44          | 0032        reg EAX becoming live
30             | 0032        reg ESI becoming dead
72             | 0034        reg ESI becoming live
57             | 003B        reg EDX becoming live
4A             | 003D        reg ECX becoming live
06             | 0043        reg EAX becoming dead
08             | 0043        reg ECX becoming dead
10             | 0043        reg EDX becoming dead
4C             | 0047        reg ECX becoming live
0E             | 004D        reg ECX becoming dead
30             | 004D        reg ESI becoming dead
F1 1B          | 0060        reg EBX becoming dead
38             | 0060        reg EDI becoming dead
FF             |

 

Output of the command has 2 sections, the first part is method info block, which contains some basic information about the JITted code, like size of the method, size of the prolog, and etc. The 2nd part is pointer table, on which I'll spend most of time. Pointer table describes lifetime of every GC reference inside the method. It has 3 columns, the first one is byte encodings, which we don't need to care about; the second column is the offset in the JITted code. E.g, this method's code starts from 02f00058, so 000B means the instruction at 02f00058+B = 2F00063, “xor     ebx,ebx”;  the third column tells us change of lifetime for a GC pointer at that instruction. E,g, “000B        reg EDI becoming live” means starts from 2F00063, register EDI is a live root; similiarly, we can see EDI becomes dead at offset 60, end of the method (0060        reg EDI becoming dead). So whatever variable EDI is used to store, its lifetime is from beginning to end of the method. To understand the pointer table, it's better to interweave the table with the JITted code. In Whidbey, SOS has "!u -gcinfo" to do the job; but for Everett, I have to manually put them together and show it along with the source code:

02f00058 55               push    ebp
02f00059 8bec             mov     ebp,esp
02f0005b 83ec08           sub     esp,0x8
02f0005e 57               push    edi
02f0005f 56               push    esi
02f00060 53               push    ebx
02f00061 33ff             xor     edi,edi
    GCInfo:      000B        reg EDI becoming live
02f00063 33db             xor     ebx,ebx
    GCInfo:      000D        reg EBX becoming live

Test t = new Test (1);  

02f00065 b9e850ad00       mov     ecx,0xad50e8 (MT: Test)
02f0006a e8a91fbcfd       call    00ac2018 (JitHelp: nc)
    GCInfo:      0017        reg EAX becoming live
02f0006f 8bf0             mov     esi,eax
    GCInfo:      0019        reg ESI becoming live
02f00071 8bce             mov     ecx,esi
    GCInfo:      001B        reg ECX becoming live
02f00073 ba01000000       mov     edx,0x1
02f00078 ff152051ad00     call    dword ptr [00ad5120] (Test..ctor)
    GCInfo:      0026        reg EAX becoming dead
    GCInfo:      0026        reg ECX becoming dead
02f0007e 8bfe             mov     edi,esi

ThreadProc tp = new ThreadProc (t.Reflect);

02f00080 b9c451ad00       mov     ecx,0xad51c4 (MT: Test/ThreadProc)
02f00085 e88e1fbcfd       call    00ac2018 (JitHelp: nc)
    GCInfo:      0032        reg EAX becoming live
    GCInfo:      0032        reg ESI becoming dead
02f0008a 8bf0             mov     esi,eax
    GCInfo:      0034        reg ESI becoming live
02f0008c 689350ad00       push    0xad5093
02f00091 8bd7             mov     edx,edi
    GCInfo:      003B        reg EDX becoming live
02f00093 8bce             mov     ecx,esi
    GCInfo:      003D        reg ECX becoming live
02f00095 ff152452ad00     call    dword ptr [00ad5224] (Test/ThreadProc..ctor)
    GCInfo:      0043        reg EAX becoming dead
    GCInfo:      0043        reg ECX becoming dead
    GCInfo:      0043        reg EDX becoming dead

NewThread (tp);

02f0009b 8bde             mov     ebx,esi
02f0009d 8bcb             mov     ecx,ebx
    GCInfo:      0047        reg ECX becoming live
02f0009f ff152c51ad00     call    dword ptr [00ad512c] (Test.NewThread)
    GCInfo:      004D        reg ECX becoming dead
    GCInfo:      004D        reg ESI becoming dead

Thread.Sleep (1000);

02f000a5 b9e8030000       mov     ecx,0x3e8
02f000aa ff155084bb79  call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02f000b0 90               nop
02f000b1 5b               pop     ebx
02f000b2 5e               pop     esi
02f000b3 5f               pop     edi
02f000b4 8be5             mov     esp,ebp
02f000b6 5d               pop     ebp
02f000b7 c3               ret

    GCInfo:      0060        reg EBX becoming dead
    GCInfo:      0060        reg EDI becoming dead

Let's assume a thread triggers GC when another thread is excuting code 02f0009d. From the table, we know at this time, register ESI, EBX, and EDI will be reported as live roots to GC. With some disassembly, we could see at this moment both ESI and EBX contain reference to Delegate tp, and EDI is variable t. It might surprise you that both EBX and EDI keep alive until the function ends. That means variable t and tp are actually alive for the whole function, so the code has no bug?!

The tricky point is that a variable is eligible to be dead once it's not used anymore. However, it's up to JIT to determine whether it really wants to report the variable to be dead. In fact, for debuggable code, JIT extends lifetime for every variable to end of the function.

To prove the premature GC bug really exists in the sample, I compiled it to "Release" build and repeated all the steps above:

 

0:000> !u 02df0058
Loaded Son of Strike data table version 5 from "C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorwks.dll"
Will print '>>> ' at address: 02df0058
Normal JIT generated code
[DEFAULT] Void Test.Main()
Begin 02df0058, size 46
>>> 02df0058 57 push edi
02df0059 56 push esi
02df005a b9e850ad00 mov ecx,0xad50e8 (MT: Test)
02df005f e8b41fcdfd call 00ac2018 (JitHelp: nc)
02df0064 8bf0 mov esi,eax
02df0066 c7460401000000 mov dword ptr [esi+0x4],0x1
02df006d b9bc51ad00 mov ecx,0xad51bc (MT: Test/ThreadProc)
02df0072 e8a11fcdfd call 00ac2018 (JitHelp: nc)
02df0077 8bf8 mov edi,eax
02df0079 689350ad00 push 0xad5093
02df007e 8bd6 mov edx,esi
02df0080 8bcf mov ecx,edi
02df0082 ff151c52ad00 call dword ptr [00ad521c] (Test/ThreadProc..ctor)
02df0088 8bcf mov ecx,edi
02df008a ff152c51ad00 call dword ptr [00ad512c] (Test.NewThread)
02df0090 b9e8030000 mov ecx,0x3e8
02df0095 ff155084bb79 call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02df009b 5e pop esi
02df009c 5f pop edi
02df009d c3 ret

0:000> !ip2md 02df0058
MethodDesc: 0x00ad50a8
Jitted by normal JIT
Method Name : [DEFAULT] Void Test.Main()
MethodTable ad50e8
Module: 151ad0
mdToken: 06000003 (D:\projects\DelegateExample\bin\Release\DelegateExample.exe)
Flags : 10
Method VA : 02df0058

0:000> !gcinfo 0x00ad50a8
Normal JIT generated code
Method info block:
method size = 0046
prolog size = 2
epilog size = 3
epilog count = 1
epilog end = yes
saved reg. mask = 0003
ebp frame = no
fully interruptible=no
double align = no
security check = no
exception handlers = no
local alloc = no
edit & continue = no
varargs = no
argument count = 0
stack frame size = 0
untracked count = 0
var ptr tab count = 0
epilog at 0043
46 21 | 

Pointer table:
A9 | 001F call 0 [ ESI ]
07 | 0026 push
CB | 0030 call 1 [ EDI ]
FF |

Here is mixed source code, native code and pointer table:

 

02df0058 57               push    edi
02df0059 56               push    esi

Test t = new Test (1);

02df005a b9e850ad00       mov     ecx,0xad50e8 (MT: Test)
02df005f e8b41fcdfd       call    00ac2018 (JitHelp: nc)
02df0064 8bf0             mov     esi,eax
02df0066 c7460401000000   mov     dword ptr [esi+0x4],0x1

ThreadProc tp = new ThreadProc (t.Reflect);

02df006d b9bc51ad00       mov     ecx,0xad51bc (MT: Test/ThreadProc)
02df0072 e8a11fcdfd       call    00ac2018 (JitHelp: nc)
    GCInfo:      001F        call 0 [ ESI ]
02df0077 8bf8             mov     edi,eax
02df0079 689350ad00       push    0xad5093
    GCInfo:      0026        push
02df007e 8bd6             mov     edx,esi
02df0080 8bcf             mov     ecx,edi
02df0082 ff151c52ad00     call    dword ptr [00ad521c] (Test/ThreadProc..ctor)
    GCInfo:      0030        call 1 [ EDI ]

EM>NewThread (tp);

02df0088 8bcf             mov     ecx,edi
02df008a ff152c51ad00     call    dword ptr [00ad512c] (Test.NewThread)

Thread.Sleep (1000);

02df0090 b9e8030000       mov     ecx,0x3e8
02df0095 ff155084bb79  call dword ptr [mscorlib_79990000+0x228450 (79bb8450)] (System.Threading.Thread.Sleep)
02df009b 5e               pop     esi
02df009c 5f               pop     edi
02df009d c3               ret

 

In release build, the JIIted code is smaller, and the pointer table in GC info is much smaller. I need to explain some new syntax in this pointer table: “call 0 [ESI]” means the method calls a function with 0 argument, and ESI is a live variable at this point; "push" just indicates change of the stack, which is important information for GC to unwind this frame, but doesn't affect pointer aliveness. 

 

One thing interesting about this new pointer table is that it only reports GC references when the method calls into other methods. What if a GC happens somewhere else? For example, at line 02df0079, EDI contains object tp and ESI contains object t, if a GC happens while a thread is excuting 02df0079 but GCInfo doesn't report those two variables, will they be collected? The answer is GC can't happen at that place. There is an important field in method info block called "fully interruptable". In debug version, this field for the method is true but in release version the field is false. A method is fully interruptable means GC could stop a thread (and perform collection) at any point if the thread is executing this method. If a method is not fully interruptable, GC can't start at arbitrary point if a thread is executing this method. It has to wait until a point when the thread returns from a call (e.g, return from ThreadProc's constructor back to Main) or when the method calls into another method which allows GC to happen (calls into unmanaged code via PInvoke always allow GC to happen). That's why we only need to report varaibles at calls. For performance reason, such trivial methods like Test.Main is usually non-fully interruptable in release build.

With all the knowledge, now we know that if a GC happens when a thread is just returning from the call "new ThreadProc" at instruction 02df0072, GC knows ESI (variable t) is a live root; if a GC happens when a thread is just returning from the call to ThreadProc's constructor at 02df0082, GC knows EDI (variable tp) is a live root. Variable t is not reported this time(but it's kept alive by object tp). However if a GC happens when a thread returns from the call to NewThread at 02df008a, neither t nor tp will be reported so they could be collected by GC. If the new thread hasn't start then, it will have trouble when using the objects.

So next time if you suspect there might be some premature GC bug, you could try "!SOS.GCInfo" to see how exactly CLR thinks about the GC object's lifetime.

This posting is provided "AS IS" with no warranties, and confers no rights.