Finding where unmanaged exceptions came from


Sometimes you’re looking at the callstack that’s in a handler after an exception was thrown. This is very common if you attached at an unhandled exception that popped up a watson dialog.

 

It might look this like:

kernel32!WaitForSingleObject+0xf

devenv!DwCreateProcess+0xbb

devenv!fExceptionHandling+0x1cb

devenv!DwExceptionFilter+0x8b

0x535ef48

 

That’s not exactly useful. What you really want is to see the callstack at the time the exception was thrown.

There’s a trick for doing this on x86. (This could be adjusted to work on 64-bit platforms.)

It works in both live debugging and minidumps, and it even works if you don’t have any symbols. I’ll first give the quick steps for how to do this, and then I’ll explain why it works.

 

How do I find it?

These instructions use Windbg, but any native debugger with memory-search (‘s’) and set-context (‘.cxr’) commands can do this too.

1)      Go to the thread of interest.

2)      Search for the first address on the stack containing the dword 0x1003f.  In Windbg, type “s -d esp L1000 1003f”. Exceptions effectively push 0x1003f onto the stack, so this will effectively look for the context of the exception on the thread’s current stack.

3)      That should show at least one result that looks like:

0535ef48  0001003f 00000000 00000000 00000000  ?……………

      The first number (0535ef48) in each row is the address, and the rest of the row is the contexts at that address. It turns out that 0x1003f if the first dword of the CONTEXT structure of the exception, and so 0535ef48 will be the address of the context. If you get multiple entries, use the first row, since that would correspond to the most recent exception.

4)      Set the current context to point at the first number in the result from step #3 (that’s 0535ef48 in this case). In windbg, type “.cxr 0535ef48”.

 

Your callstack and registers should now look more sane. In my example, it looks like:

mscordbi!CordbHashTable::GetBase

mscordbi!CordbThread::RefreshStack+0x349

mscordbi!CordbProcess::DispatchRCEvent+0x1291

mscordbi!CordbRCEventThread::ThreadProc+0x9

 

eax=00000000 ebx=04700168 ecx=00000048 edx=0535f230 esi=00000000 edi=0535f254

eip=636ac786 esp=0535f214 ebp=0535f228 iopl=0         nv up ei pl zr na po nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210246

mscordbi!CordbHashTable::GetBase:

636ac786 80791400         cmp     byte ptr [ecx+0x14],0x0 ds:0023:0000005c=??

 

My program was dereferencing 0x5c (=0x48+0x14). No wonder it crashed.

Note that we didn’t need any symbols for any of the modules on the original stack to find this. Even if we didn’t have any symbol at all, we’d still have the disassembly of the code that crashed.

 

Why it works?

There’s some key data here.

1)      When the OS throws an exception, it pushes the CONTEXT of the original throw site on the stack. (This is part of the EXCEPTION_POINTERS).

2)      On x86, The first field in the CONTEXT structure is a flags field which is always set to 0x1003f. It is also very unlikely for this value to randomly appear for some other reason at the top of your stack.

3)      On x86, stacks grow down. Thus the current stack pointer (esp) represents the high end of the range in which the 0x1003f will appear. Thus if we’re searching for something that’s close to the top of the stack (within 0x1000 bytes), we can search in the range (esp, esp-0x1000).

4)      Windbg’s “s -d” command searches memory for dwords. The format is “s –d <range> <value>”. <range> can be of the form “<address> L<length>”, which will search for ‘value’ in the range (address, address-length). So “s -d esp L1000 1003f” means “search for the dword 0x1003f in the range (esp, esp-0x1000).  0x1000 is an arbitrary number here that seem sufficient.

5)      A debugger can do a stackwalk from any context. Most debugger’s just automatically use a thread’s current context (via kernel32!GetThreadContext), but there’s no reason that  a debugger couldn’t take an arbitrary context. Windbg provides a great command, “.cxr”, which lets you do just that. You can set the context that you want to inspect at. (VS is adding this command too).

 

So if you look back over the original steps, you can see that step #2 searches the thread’s stack for a context pushed by the exception, and then step #4 tells the debugger to view that current context.

If step #3 gives you multiple rows, that may indicate a case of nested exceptions. Or it may be that random case that you have a local variable “int i= 0x1003f”. In either case, you can try .cxr on all the values (starting with the most recent) to find the callstack that makes sense.

 

Just for kicks, inspect the CONTEXT pointer we supplied to .cxr, and you can see for yourself it matches the output to the register command:

0:015> dt _CONTEXT 0535ef48

   +0x000 ContextFlags     : 0x1003f

   +0x004 Dr0              : 0

   +0x008 Dr1              : 0

   +0x00c Dr2              : 0

   +0x010 Dr3              : 0

   +0x014 Dr6              : 0

   +0x018 Dr7              : 0

   +0x01c FloatSave        : _FLOATING_SAVE_AREA

   +0x08c SegGs            : 0

   +0x090 SegFs            : 0x3b

   +0x094 SegEs            : 0x23

   +0x098 SegDs            : 0x23

   +0x09c Edi              : 0x535f254

   +0x0a0 Esi              : 0

   +0x0a4 Ebx              : 0x4700168

   +0x0a8 Edx              : 0x535f230

   +0x0ac Ecx              : 0x48

   +0x0b0 Eax              : 0

   +0x0b4 Ebp              : 0x535f228

   +0x0b8 Eip              : 0x636ac786

   +0x0bc SegCs            : 0x1b

   +0x0c0 EFlags           : 0x210246

   +0x0c4 Esp              : 0x535f214

   +0x0c8 SegSs            : 0x23

   +0x0cc ExtendedRegisters : [512]  “???”

 

0:015> r

Last set context:

eax=00000000 ebx=04700168 ecx=00000048 edx=0535f230 esi=00000000 edi=0535f254

eip=636ac786 esp=0535f214 ebp=0535f228 iopl=0         nv up ei pl zr na po nc

cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00210246

 

 

This is mostly for native exceptions; but would work with managed exceptions + SOS too. 

Empirically, I’ve been using this technique for a long time now and it’s worked perfectly every single time. And I’ve never found a stray local with value=0x1003f.

 

 

 


Comments (26)

  1. Mike Stall says:

    Raj Rangarajan, a developer on the Visual Studio Debugger, pointed out that you can do this in VS too. Too quote Raj:

    VS has both the commands you listed, with minor mod to the .s cmd

    .s -d <actual value of esp> L1000 0x1003f

  2. > On x86, The first field in the CONTEXT

    > structure is a flags field which is always

    > set to 0x1003f.

    Actually for some x86 CPUs it can also be 0x1001f.

    0x1003f is CONTEXT_ALL on x86 (the flag definitions can be found in winnt.h). 0x1001f is the same value minus the CONTEXT_EXTENDED_REGISTERS flag. So apparently if your processor type doesn’t support any extended registers you need to use 0x1001f.

    > This is mostly for native exceptions; but

    > would work with managed exceptions + SOS too.

    SOS doesn’t seem to use the context record set with .cxr (at least the !clrstack command doesn’t use it). You can work around this by manually setting EIP, ESP and EBP to their corresponding values from the context record (this also works for old debuggers like VC6 that don’t have a .cxr command).

  3. Dumb says:

    Perhaps this is in error, but does not the same address appear on the first message? Quote:



    It might look this like:

    kernel32!WaitForSingleObject+0xf

    devenv!DwCreateProcess+0xbb

    devenv!fExceptionHandling+0x1cb

    devenv!DwExceptionFilter+0x8b

    0x535ef48



    """"" there

    But you still decided to go search for it? Or perhaps this came to the blog by accident, which is more likely 🙂

  4. Mike Stall says:

    Good catch – that the address of the context appeared in the callstack here is definitely a very unusual coincidence.

  5. Mata Kosmev says:

    >Too quote Raj:

    >VS has both the commands you listed, >with minor mod to the .s cmd

    I can’t seem to make them work? Get

    ‘CXX0013: Error: missing operator’

    no matter what bullshit I type in the command window.

    Does this need to be enabled in some way?

  6. Mike Stall says:

    Mata, I just tried this out in the latest VS (Whidbey Beta 2), and it does work.

    a few things:

    – I don’t think previous versions of VS supported both the .cxr and .s commands. (I know .cxr is new in Whidbey).

    – it looks like the .s command doesn’t support embedding expressions in the address range. So you can’t say "esp L1000", and have to say "0x12ff00 L1000" instead.

    – Be careful about hex vs. decimal. I manually prefix everything with ‘0x’ to specifiy decimal.

    Here’s my sample output from VS’s immediate window (this is from a different session than my windbg sample above, so the numbers are different):

    ? esp

    0x0012f990

    .s -D 0x0012f990 L10000 0x1003f

    Found match at

    0x12fb88

    .cxr 0x12fb88

    And now I’m at the right context.

  7. Mata Kosmev says:

    Ah thanks, I didn’t realize this is about Whidbey. I’ve been trying it with VS2003.

  8. In-depth Articles- Matt Peitrek on the internals of SEH . Matt Pietrek on Vectored Exception Handling

  9. Nick Parker says:

    Finding Unmanaged Exceptions

  10. suiyingjie says:

    通往WinDbg的捷径(二)

  11. This script automates a technique I’ve been using for a long time whenever I need to see the exceptions

  12. This script automates a technique I’ve been using for a long time whenever I need to see the exceptions