Bad Native Callstacks

On Friday, a coworker emailed me a callstack that looked wrong. I sat down and looked at it. It turned out that the callstack was actually correct. Afterwards, I decided that it was time for a blog about bad native callstacks.

Times when the debugger can’t walk the stack:

  1. You don’t have symbols. If you don’t have symbols for any of the optimized code on the stack, the debugger doesn’t have a chance. Use symbol server (;en-us;319037). Always generate PDBs for your own code.
  2. Crazy OS symbols. The basic problem is that for Windows XP, the OS team wanted to use the VS7 C++ compiler. However, they still symbol files that VC6 could read. The result was that they saved their symbol files in VC6 syntax. Sounds like a great idea, except for the fact that the VS7 C++ compiler had some new optimizations that couldn’t be represented in the older PDB format. The debugger tries to compensate for this. Unusually it is successful, but I am sure there are still cases where we aren’t going to make it. This is only a problem when the debugger needs to walk the stack through lots of OS goop.
  3. Functions written in assembly. If you write code in assembly, you can do all sorts of crazy things that will confuse the debugger. The C Runtime does this in places.
  4. Blowing the stack. If you go and overflow some stack buffer, among the many possible problems, you may corrupt the callstack. Once this has happened, it’s very difficult to figure out what went wrong. Rather then poking about memory, a much more time effective strategy is to enable C Runtime checks (debug), or use the ‘/gs’ compiler flag (retail).
  5. Bad ESP/EIP value. This usually happens when your code starts executing junk. Trying turning on page heap (;en-us;Q286470), and break on first chance access violations. You are probably calling through the v-table of a deleted object. If you are really desperate, the second DWORD from '@tib' is StackBase. If you want to look at the stack, this will get you started.

Callstacks that look wrong, but are correct:

  1. Bad interface pointers / bad virtual function call: This happens when code either has a bad typecast, or calls into a deleted object. The result is a callstack that looks bizarre. Suppose the following code:
    interface I1 : IUnknown
    HRESULT Alpha(...);
    interface I2 : IUknown
    HRESULT Beta(...);
    class C1 : public I1
    class C2 : public I2
    void CallBeta(I2 *p)

    Add a callstack of:


    What’s going on? The hint here is that if you look at the vtable for Interface1 and Interface2, you will see that Alpha and Beta would occupy the same v-table spot. For instance, if Alpha is the 5th method in Interface1, then Beta is the 5th method in Interface2. Because there is something wrong with the paramater to CallBeta(...), you get the strange callstack that doesn't seem to make sense.

  2. Bad function pointer. This is the more straight forward version of the last problem. Calling into bad function pointers leads to confusing stacks.
  3. Linker COMDAT folding. If the compiler notices that two functions produce the same machine code, it may only emit one function. This can be confusing in the debugger, because function names can be wrong.

Comments (1)
  1. Pavel Lebedinsky says:

    >turning on page heap (;en-us;Q286470)

    This KB article sucks. What the hell is pageheap.exe and where do I get it? The description of what pageheap actually does is also very confusing (the pageheap functionality is in the OS, all pageheap.exe does is turn on some flags in the registry to enable it).

    If you already have debuggers (cdb/windbg) installed, the right way to enable pageheap is to use gflags.exe that comes with the debugger package:

    c:debuggers> gflags -p -?

    By the way, it’s a good idea to use cdb/windbg (instead of VC) to debug crashes caused by pageheap, because you can do things like !heap -p -a to dump info about crash address etc.

Comments are closed.

Skip to main content