Toolbar Compatibility Debugging Walkthrough


In the past I’ve found debugging walkthroughs useful for picking up new techniques. In that spirit here’s a quick rundown of a bug I was investigating today that may have some useful tidbits.


 


This was a crash in IE that involved a toolbar that I didn’t have the source code for. The issue was that if you clicked in the toolbar’s edit box and later closed the browser, IE would crash.


 


It was crashing while trying to call Release() on a pointer to the toolbar and initially looked like a reference counting issue, either in IE or the toolbar code itself. This type of bug can be tricky to track down in your own code, so given that this bug straddles legacy IE code and external toolbar code I closed my door and prepared for the worst. 🙂


 


I started out by turning on full pageheap using gflags.exe, which is part of the standard debugging package, and repro’d the bug. This was to ensure that the crash wasn’t a side-effect of heap corruption, and that I was debugging the right thing.


 


Next I put a breakpoint on the toolbar’s Release(). Since I don’t have the source I had to track this down manually:


 


0:005> kP 1


ChildEBP RetAddr


01eaeab0 0074f7fb xxxxx!xxxx::_xxxxxxx(


                        struct IUnknown * ptb = 0x020f3940)


0:005> dds 0x020f3940


020f3940  10031b44 toolbar!DllMain+0x27d24


020f3944  10031b2c toolbar!DllMain+0x27d0c


020f3948  10031b18 toolbar!DllMain+0x27cf8


020f394c  10031af8 toolbar!DllMain+0x27cd8


020f3950  10031ad8 toolbar!DllMain+0x27cb8


020f3954  10031f50 toolbar!DllMain+0x28130


020f3958  00000003


  […]


0:005> dds 10031b44


10031b44  1000cc90 toolbar!DllMain+0x2e70


10031b48  1000cdd0 toolbar!DllMain+0x2fb0


10031b4c  1000cdf0 toolbar!DllMain+0x2fd0


10031b50  1000ce20 toolbar!DllMain+0x3000


  […]


 


I could have also unassembled the code and traced the logic, but I’ve found that it’s often faster to just use “dds” to dump interesting-looking addresses. “dds” is especially useful for dumping the stack when symbols are incomplete (or the stack is corrupt) and for tracking down objects on on optimized builds where the debugger gets confused. (When you have symbols and dump an address it will be immediately obvious from the vtable whether you’re looking at the right object.)


 


The IUnknown interface has three methods: QueryInterface(), AddRef(), and Release(), in that order. Given the dump of the vtable I assumed toolbar!DllMain+0x2fd0 was the Release() function and confirmed by unassembling it. It looked right, so I put a breakpoint on just before the return:


 


0:005> u toolbar!DllMain+0x2fd0


  […]


1000ce15 8b06             mov     eax,[esi]


1000ce17 5e               pop     esi


1000ce18 c20400           ret     0x4


1000ce1b cc               int     3


0:005> bp 1000ce18


 


and then re-ran the repro. For brevity I’ve left out many of the calls and removed redundant output. ‘eax’ holds the return value of Release() so you can see that it’s winding down to the point of doing the final Release() (at which point the object will delete itself).


 


0:005> g


Breakpoint 1 hit


eax=00000004 ebx=020f39ec ecx=020f395c edx=00000850 esi=020f3d84 edi=00000000


eip=1000ce18 esp=01eaf4cc ebp=00000000 iopl=0         nv up ei pl nz na pe cy


cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000203


toolbar!DllMain+0x2ff8:


1000ce18 c20400           ret     0x4


0:005> g


Breakpoint 1 hit


eax=00000003 ebx=014ab010 ecx=020f395c edx=10031b18 esi=0224000a edi=00000008


0:005> g


Breakpoint 1 hit


eax=00000002 ebx=014ab010 ecx=020f395c edx=10031b44 esi=0224000a edi=00000008


0:005> g


wn IEFRAME  CDocObjectView::DestroyViewWindow(): Destroying Host Window


Breakpoint 1 hit


eax=00000001 ebx=00000000 ecx=020f395c edx=00803e30 esi=020f3d04 edi=020f9328


0:005> g


Unable to remove breakpoint 1 at 1000ce18, Win32 error 487


    “Attempt to access invalid address.”


The breakpoint was set with BP.  If you want breakpoints


to track module load/unload state you must use BU.


(564.db0): Access violation – code c0000005 (first chance)


First chance exceptions are reported before any exception handling.


This exception may be expected and handled.


Unable to remove breakpoint 1 at 1000ce18, Win32 error 487


    “Attempt to access invalid address.”


The breakpoint was set with BP.  If you want breakpoints


to track module load/unload state you must use BU.


 


Ah ha! This wasn’t what I was looking for, but you can see that before we do the final release — or crash — the debugger complains that a breakpoint is set in a module that has been unloaded. The crash happens shortly after this and is simply caused by trying call into the module after it’s been unloaded.


 


So why was it unloaded? Let’s put a breakpoint on the module unload and re-run the repro and find out:


 


0:005> sxe ud:toolbar


0:005> g


  […] 


Breakpoint 1 hit
eax=00000001 ebx=00000000 ecx=020f395c edx=00803e30 esi=020f3d04 edi=020f9328
eip=1000ce18 esp=01eafa98 ebp=00000000 iopl=0         nv up ei pl nz na pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000203
toolbar!DllMain+0x2ff8:
1000ce18 c20400           ret     0x4


0:005> k


*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\WINDOWS\system32\kernel32.dll –


ChildEBP RetAddr


WARNING: Stack unwind information not available. Following frames may be wrong.


01eafcf4 7c80aa7f ntdll!KiFastSystemCallRet


*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\WINDOWS\system32\ole32.dll –


01eafd08 77513442 kernel32!FreeLibrary+0x19


01eafd14 77513456 ole32!CoFreeUnusedLibraries+0xa9


01eafeb8 77513578 ole32!CoFreeUnusedLibraries+0xbd


01eafec8 775133a2 ole32!CoFreeUnusedLibrariesEx+0x2e


01eafeec 007ab40f ole32!CoFreeUnusedLibraries+0x9


01eaffb4 7c80b50b xxxxx!xxx::_xxxxxxxx+0x3af


 


It’s being unloaded when IE’s code calls CoFreeUnusedLibrariesEx() when the window is closed. This is code I’m not super-familiar with, but I presume we’re doing it to trigger the unloading of DLLs for BHOs, toolbars ActiveX controls, and so on, to free up memory. However, we still have properly reference counted pointers to the toolbar so it shouldn’t be unloading quite yet.


 


According to MSDN CoFreeUnusedLibrariesEx() calls DllCanUnloadNow(), which is supposed to return S_FALSE if the DLL is not yet ready to be unloaded. Let’s set a breakpoint, step through the function, and see what it’s returning in this scenario:


 


0:005> bp toolbar!DllCanUnloadNow


0:005> g


  […] 


Breakpoint 0 hit


eax=00000000 ebx=00000001 ecx=77606074 edx=00000000 esi=014a6dd0 edi=77606068


eip=10008fd0 esp=01eafd20 ebp=01eafd30 iopl=0         nv up ei pl zr na po nc


cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246


toolbar!DllCanUnloadNow:


10008fd0 8b0d3c1c0410 mov ecx,[toolbar!DllMain+0x37e1c (10041c3c)] ds:0023:10041c3c=00000000


0:005> p


[…]


0:005> p


eax=00000000 ebx=00000001 ecx=00000000 edx=00000000 esi=014a6dd0 edi=77606068


eip=10008fdd esp=01eafd20 ebp=01eafd30 iopl=0         nv up ei pl zr na po nc


cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246


toolbar!DllCanUnloadNow+0xd:


10008fdd c3               ret


 


As you can see by looking at ‘eax’ this function is returning 0, which is S_OK. I believe this is the cause of the bug.


 


After reviewing recent changes that I had made in IE7 I found that one of them caused us to hold onto the toolbar object longer than we used to. In previous versions of IE we happened to always do the final Release() before calling CoFreeUnusedLibrariesEx(), masking the bug in the toolbar. The fix in this case, for better or worse, was to update the code so that we release earlier like we used to.


 


Thoughts? Are these types of walkthroughs interesting or useful? If so I’ll do more of them.

Comments (7)

  1. Kris says:

    Just wondering if you guys make use of tools like IDA Pro and Soft ICE in your day to day debugging of problems like this or do you just go about with the Windows Tools. The reason I ask I am very interested in these debugging/reverse engineering as a hobby but most books I have seen talk about IDA Pro or the Soft ICE. Any thoughts?

  2. Pavel Lebedinsky says:

    dpp is another command useful for identifying objects with vtables. It’s kind of like dps but with an extra dereference, so if you do dpp esp you’ll see if there are any COM/C++ object pointers on the stack (provided you have private symbols of course).

    Kris – in the Windows group, most people use kd/ntsd/cdb. These debuggers evolve with the OS so you always have support for the latest features. For example, if the heap implementation changes, !heap command will be updated and so on.

  3. tonyschr says:

    Ah, I hadn’t tried ‘dpp’. That is pretty useful.

    Kris, as Pavel said for the most part we use the standard Windows debuggers. However, I’m sure that for hard-core application compatibility debugging and perhaps other uses that some people here use IDA Pro and others.

    A long time ago a team I was on helped load balance a few application compatibility bugs for XP, and for a couple of them I found OllyDbg (http://www.ollydbg.de) to be helpful.

  4. PatriotB says:

    So, in the end, did you keep your change in place and notify the toolbar developer? Or for the sake of compatibility are you reverting to the previous behavior? (I’m hoping for the former.)

  5. tonyschr says:

    PatriotB – newer versions of this toolbar do not have this problem, so it has already been fixed.

    However, the fix will help users of the older toolbar as well as unknown toolbars that might have the same type of bug.

    (In this case the fix also cleaned up the code slightly by removing a redundant pointer. 🙂

  6. PatriotB says:

    Oops, looks like I had missed the last sentence "The fix in this case, for better or worse, was to update the code so that we release earlier like we used to."

  7. cortelli says:

    "Thoughts? Are these types of walkthroughs interesting or useful? If so I’ll do more of them."

    Keep them coming please. Your friendly neighborhood tester appreciates it :).