Restoring symbols to a stack trace originally generated without symbols


Has this ever happened to you?

litware!Ordinal3+0x6042
litware!DllInstall+0x4c90
litware!DllInstall+0x4b9e
contoso!DllGetClassObject+0x93c3
contoso!DllGetClassObject+0x97a9
contoso!DllGetClassObject+0x967c
contoso!DllGetClassObject+0x94d7
contoso!DllGetClassObject+0x25ce
contoso!DllGetClassObject+0x2f7b
contoso!DllGetClassObject+0xad55
contoso!DllGetClassObject+0xaec7
contoso!DllGetClassObject+0xadf7
contoso!DllGetClassObject+0x3c00
contoso!DllGetClassObject+0x3b2a
contoso!DllGetClassObject+0x462b
USER32!UserCallWinProcCheckWow+0x13a
USER32!DispatchMessageWorker+0x1a7
contoso!DllCanUnloadNow+0x19b6
contoso!DllGetClassObject+0xeaf2
contoso+0x1d6c
litware!LitImportReportProfile+0x11c4
litware!LitImportReportProfile+0x1897
litware!LitImportReportProfile+0x1a3b
KERNEL32!BaseThreadInitThunk+0x18
ntdll!RtlUserThreadStart+0x1d

Ugh. A stack trace taken without working symbols. (There's no way that Dll­Get­Class­Object is a deeply recursive 60KB function. Just by casual inspection, you know that the symbols are wrong.)

To see how to fix this, you just have to understand what the debugger does when it has no symbols to work from: It uses the symbols from the exported function table. For every address it wants to resolve, it looks for the nearest exported function whose address is less than or equal to the target value.

For example, suppose CONTOSO.DLL has the following exported symbols:

Symbol Offset
Dll­Get­Class­Object 0x5132
Dll­Can­Unload­Now 0xFB0B

Look at it this way: The debugger is given the following information about your module: (Diagram not to scale.)

 
  Dll­Get­Class­Object Dll­Can­Unload­Now

It needs to assign a function to every byte in the module. In the absence of any better information, it does it like this:

??? Dll­Get­Class­Object Dll­Can­Unload­Now

In words, it assumes that every function begins at the location specified by the export table, and it ends one byte before the start of the next function. The debugger is trying to make the best of a bad situation.

Suppose your DLL was loaded at 0x10000000, and the debugger needs to generate a symbolic name for the address 0x1000E4F5.

First, it converts the address into a relative virtual address by subtracting the DLL base address, leaving 0xE4F5.

Next, it looks to see what function "contains" that address. From the algorithm described above, the debugger concludes that the address 0xE4F5 is "part of" the Dll­Get­Class­Object function, which began at begins at 0x5132. The offset into the function is therefore 0xE4F5 - 0x5132 = 0x93C3, and it is reported in the debugger as contoso!Dll­Get­Class­Object+0x93c3.

Repeat this exercise for each address that the debugger needs to resolve, and you get the stack trace above.

Fine, now that you know how the bad symbols were generated, how do you fix it?

You fix it by undoing what the debugger did, and then redoing it with better symbols.

You need to find the better symbols. This is not too difficult if you still have a matching binary and symbol file, because you can just load up the binary into the debugger in the style of a dump file. Like Doron, you can then let the debugger do the hard work.

C:> ntsd -z contoso.dll

ModLoad: 10000000 10030000   contoso.dll

Now you just ask the debugger, "Could you disassemble this function for me?" You give it the broken symbol+offset above. The debugger looks up the symbol, applies the offset, and then looks up the correct symbol when disassembling.

0:000> u contoso!DllGetClassObject+0x93c3
contoso!CReportViewer::ActivateReport+0xe9:
10000e4f5 eb05            jmp     contoso!CReportViewer::ActivateReport+0xf0

Repeat for each broken symbol in the stack trace, and you have yourself a repaired stack trace.

litware!Ordinal3+0x6042 ← oops
litware!CViewFrame::SetInitialKeyboardFocus+0x58
litware!CViewFrame::ActivateViewInFrame+0xf2
contoso!CReportViewer::ActivateReport+0xe9
contoso!CReportViewer::LoadReport+0x12c
contoso!CReportViewer::OnConnectionCreated+0x13f
contoso!CViewer::OnConnectionEvent+0x7f
contoso!CConnectionManager::OnConnectionCreated+0x85
contoso!CReportFactory::BeginCreateConnection+0x87
contoso!CReportViewer::CreateConnectionForReport+0x20d
contoso!CViewer::CreateNewConnection+0x87
contoso!CReportViewer::CreateNewReport+0x213
contoso!CViewer::OnChangeView+0xec
contoso!CReportViewer::WndProc+0x9a7
contoso!CView::s_WndProc+0xf1
USER32!UserCallWinProcCheckWow+0x13a
USER32!DispatchMessageWorker+0x1a7
contoso!CViewer::MessageLoop+0x24e
contoso!CViewReportTask::RunViewer+0x12
contoso+0x1d6c ← oops
litware!CThreadTask::Run+0x40
litware!CThread::ThreadProc+0xe5
litware!CThread::s_ThreadProc+0x42
KERNEL32!BaseThreadInitThunk+0x18
ntdll!RtlUserThreadStart+0x1d

Oops, our trick doesn't work for that first entry in the stack trace, the one with Ordinal3. What's up with that? There is no function called Ordinal3!

If your module exports functions by ordinal without a name, then the debugger doesn't know what name to print for the function (since the name was stripped from the module), so it just prints the ordinal number. You will have to go back to your DLL's DEF file to convert the ordinal back to a function name. Or you can dump the exports from the DLL to see what functions match up with what ordinals. (Of course, for that trick to work, you need to have a matching PDB file in the symbol search path.)

In our example, suppose litware.dll ordinal 3 corresponds to the function Lit­Debug­Report­Profile. We would then ask the debugger

0:001> u litware!LitDebugReportProfile+0x6042
litware!CViewFrame::FindInitialFocusControl+0x66:
1000084f5 33db            xor     ebx,ebx

Okay, that takes care of our first oops. What about the second one?

In the second case, the address the debugger was asked to generate a symbol for came before the first symbol in the module. In our diagram above, it was in the area marked with question marks. The debugger has absolutely nothing to work with, so it just disassembles as relative to the start of the module.

To resolve this symbol, you take the offset and add it to the base of the module as it was loaded into the debugger, which was reported in the ModLoad output:

ModLoad: 10000000 10030000   contoso.dll

If that output scrolled off the screen, you can ask the debugger to show it again with the help of the lmm command.

0:001>lmm contoso*
start    end        module name
10000000 10030000   contoso    (export symbols)       contoso.dll

Once you have the base address, you add the offset back and ask the debugger what's there:

0:001> u 0x10000000+0x1d6c
contoso!CViewReportTask::Run+0x102:
100001d6c 50              push    eax

Okay, now that we patched up all our oopses, we have the full stack trace with symbols:

litware!CViewFrame::FindInitialFocusControl+0x66
litware!CViewFrame::SetInitialKeyboardFocus+0x58
litware!CViewFrame::ActivateViewInFrame+0xf2
contoso!CReportViewer::ActivateReport+0xe9
contoso!CReportViewer::LoadReport+0x12c
contoso!CReportViewer::OnConnectionCreated+0x13f
contoso!CViewer::OnConnectionEvent+0x7f
contoso!CConnectionManager::OnConnectionCreated+0x85
contoso!CReportFactory::BeginCreateConnection+0x87
contoso!CReportViewer::CreateConnectionForReport+0x20d
contoso!CViewer::CreateNewConnection+0x87
contoso!CReportViewer::CreateNewReport+0x213
contoso!CViewer::OnChangeView+0xec
contoso!CReportViewer::WndProc+0x9a7
contoso!CView::s_WndProc+0xf1
USER32!UserCallWinProcCheckWow+0x13a
USER32!DispatchMessageWorker+0x1a7
contoso!CViewer::MessageLoop+0x24e
contoso!CViewReportTask::RunViewer+0x12
contoso!CViewReportTask::Run+0x102
litware!CThreadTask::Run+0x40
litware!CThread::ThreadProc+0xe5
litware!CThread::s_ThreadProc+0x42
KERNEL32!BaseThreadInitThunk+0x18
ntdll!RtlUserThreadStart+0x1d

Now the fun actually starts: Figuring out why there was a break in CView­Frame::Find­Initial­Focus­Control. Happy debugging!

Bonus tip: By default, ntsd does not include line numbers when resolving symbols. Type .lines to toggle line number support.

Comments (10)
  1. Jon says:

    Instead of 'u'

      0:000> u contoso!DllGetClassObject+0x93c3

    I'd use 'ln'

      0:000> ln contoso!DllGetClassObject+0x93c3

    There's less output to deal with.

    [I usually append "L1" to limit to one line of disassembly, so the output is not huge. Also, "LN" puts the information after the address, which makes it harder to spot at a glance, and it increases the likelihood that there will be a line break in the middle of the symbol, making it harder to read and copy/paste. But that's just my preference. Your preference is also valid. -Raymond]
  2. "began at begins at"

    "lm m contoso*" also works; there isn't an "lmm" command per se, rather an "m <pattern>" option to the "lm" command.

  3. Neil says:

    I wondered what was going on at first but then it dawned on me that you were in fact trying to help someone who had only provided a stack trace formatted without symbols.

  4. Hi. Won't ASLR make this whole thing impossible?

    [The debugger knows where Contoso.DLL was loaded. That's how it got the half-hearted symbols in the first place. (If you didn't let debuggers know where DLLs got loaded, it's going to be hard for the debugger to accomplish much of anything.) There is no requirement that Contoso.DLL be loaded at its preferred base address. (Indeed, it wasn't.) -Raymond]
  5. anyfoo says:

    @Fleet Command: when debugging you usually know what exists at which virtual address, even if that virtual address is random. With that knowledge you can translate the addresses into offsets into the DLL/executable/whatever you are interested in.

    Not a security problem, because, as Raymond would put it, to get that information you have to already be on the other side of the airtight hatch. Unless there are bugs or oversights which permit you to get the mapping without proper privileges.

  6. Gabe says:

    Why would you even care where a DLL got loaded? If the addresses were all absolute, you'd definitely have to know. But the addresses are all offsets relative to the DLL base address (contoso+0x1d6c), so the base address isn't used anywhere in the computation.

  7. 640k says:

    Why can't debuggers do this work year 2013?

    [Debuggers have been fully capable of doing this for decades if they have access to symbols. The issue is "What if the debugger doesn't have access to symbols?" Or are you asking, "Why can't debuggers be smart enough to find symbols anywhere in the universe?" -Raymond]
  8. anyfoo says:

    @Gabe: well, the debugger has to know.

  9. Escaper says:

    "It uses the symbols from the exported function table. For every address it wants to resolve, it looks for the nearest exported function whose address is less than or equal to the target value."

    This sounds like a good example of a situation where it is better not to do anything than do something. Because using the above described algorithm is plainly misleading as it provides user with wrong function names. I see no value in that compared to just seeing offsets for unrecognized functions.

  10. Brian_EE says:

    @Escaper: How would you suggest the debugger know that the exported symbols + offset are wrong? What if the exception occurred in an exported function and the entire call chain was only exported functions? Wouldn't you then complain that the debugger didn't give you a correct stack trace?

    Or would you prefer that the debugger try to reverse engineer the DLL looking for jmp and ret instructions and somehow piece together function boundaries? What if that process doesn't work all the time?

Comments are closed.

Skip to main content