My Toolbar or BHO is Causing IE7 on Vista to Crash on Close. Help!

During the development of IE7, one problem we discovered was that a small number of extensions have unbalanced CoInitialize() or CoUninitialize() calls. On IE6 they sometimes lucked out, but due to architectural changes in IE7 these would cause crashes, hangs, or erratic behavior.

Unbalanced CoInitialize() calls seem harmless enough at first glance. During shutdown COM will detect this and do the missing uninits. However, one problem is that on XP it does this while holding the loader lock! If the destruction of any of the objects triggers a cross-thread COM call (or a SendMessage), or any similar scenario where another piece of code needs to acquire the loader lock at this time, the application will deadlock.

Unbalanced CoUninitialize() calls lead to a different sort of problem: if code does too many of them it will cause COM to clean up the apartment on that thread early, and after that point all cross-thread COM calls involving that thread will fail. This usually leads to erratic behavior, as objects that need to talk to each other can no longer do so.

As a workaround, IE7 initially leveraged IInitializeSpy to detect both types of out-of-balance initializations, and fix them on the fly. After all, since we're the application and we know that COM needs to stay initialized for the entire duration of the thread, none of the inits or uninits in between really matter.

Unfortunately, life isn't that simple. It turned out that using IInitializeSpy in this way was slightly hacky and we had a couple issues with it causing conflicts with other behavior. In addition, using IInitializeSpy would perpetuate the problem for future versions because now developers (both internal and external) could be sloppy with the inits and uninits without obvious consequenses.

On XP the balancing is still done; for various reasons both are needed. However, on Vista both are turned off by default. For the unbalanced CoInitialize() calls COM no longer does cleanup while holding the loader lock. For the unbalanced CoUnitialize() calls we decided that crashing early to help developers detect the problem was preferable to subtle bugs and erratic behavior.

So, how do you debug this if it's happening to you? In this example I took the Hello World BHO sample and added a spurious CoUninitialize() call to the SetSite(NULL) implementation.

 

First, I recommend installing the Microsoft Debugging Tools. You can probably accomplish all of this through Visual Studio or some other debugger, but it's easier to talk through this way.

The next step is to launch IE under the debugger and set the symbols to Microsoft's public symbol server so you can see what's going on and, later, set breakpoints.

C:\Program Files\Internet Explorer>cdb iexplore.exe

. . .
0:000> .sympath SRV*c:\publicsymbols*https://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\publicsymbols*https://msdl.microsoft.com/download/symbols
0:000> .reload

. . .

Now, run the scenario which triggers the crash on close, close IE. If this is the issue, you will see something like the following:

(1c18.1140): Unknown exception - code 800401fd (first chance)
(1c18.1140): Unknown exception - code 800401fd (first chance)
(1c18.1140): Unknown exception - code 800401fd (!!! second chance !!!)
eax=02d5f968 ebx=002bef74 ecx=00342208 edx=6efba36b esi=00342208 edi=00000000
eip=7709b09e esp=02d5f968 ebp=02d5f9b8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
kernel32!RaiseException+0x58:
7709b09e c9 leave
0:004> .lastevent
Last event: 1c18.1140: Unknown exception - code 800401fd (!!! second chance !!!)
debugger time: Fri Dec 8 16:48:47.924 2006 (GMT-8)
0:004> !error 800401fd
Error code: (HRESULT) 0x800401fd (2147746301) - Object is not connected to server
0:004> k
ChildEBP RetAddr
02d5f9b8 6f0acad0 kernel32!RaiseException+0x58
02d5f9d0 6eff15c3 IEFRAME!CShellBrowser2::_CrashOnTabToFrameCommunicationSevered+0x14
02d5f9f0 6efb98d4 IEFRAME!CShellBrowser2::_DoFinalCleanup+0x15e
02d5fa10 6efb9ad9 IEFRAME!CShellBrowser2::_OnConfirmedClose+0xad
02d5fa24 6efb9a12 IEFRAME!CShellBrowser2::OnClose+0x109
02d5fa8c 770c3833 IEFRAME!CTabWindow::_TabWindowThreadProc+0x1ec
02d5fa98 7787a9bd kernel32!BaseThreadInitThunk+0xe
02d5fad8 00000000 ntdll!_RtlUserThreadStart+0x23

IE forces a crash in this scenario by raising a non-continuable exception with the HRESULT of the error code returned when a cross-thread COM call unexpectedly failed. The stack, conveniently readable due to public symbols, makes it even more obvious what's going on.

Now that we have confirmed this is indeed the cause of the crash, the next step is to restart IE and put a breakpoint in COM to see what call is causing the apartment to get torn down early. We do this by setting a breakpoint on ole32!ApartmentUninitialize and re-running the scenario:

0:000> .sympath SRV*c:\publicsymbols*https://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\publicsymbols*https://msdl.microsoft.com/download/symbols
0:000> .reload

. . .

0:000> bp ole32!ApartmentUninitialize
0:000> g

. . .
Breakpoint 0 hit
eax=004e4620 ebx=7706064c ecx=00000000 edx=00000000 esi=02b1f994 edi=00000000
eip=76f8a5bc esp=02b1f968 ebp=02b1f97c iopl=0 nv up ei ng nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000282
ole32!ApartmentUninitialize:
76f8a5bc 8bff mov edi,edi
0:004> k
ChildEBP RetAddr
02b1f964 76f8ad7e ole32!ApartmentUninitialize
02b1f97c 76f89d98 ole32!wCoUninitialize+0x88
02b1f998 77963951 ole32!CoUninitialize+0x71
02b1f9a4 779648d9 IMM32!CtfImmCoUninitialize+0x34
02b1f9ac 76f77f25 IMM32!ISPY_PostUninitialize+0x51
02b1f9c8 76f89c3b ole32!NotifyInitializeSpies+0x6a
*** WARNING: Unable to verify checksum for c:\Proj\TestBho1\TestBho1\Debug\TestBho1.dll
02b1f9ec 02c32b59 ole32!CoUninitialize+0x98
02b1facc 6efce448 TestBho1!CHelloWorldBHO::SetSite+0xb9
02b1faec 6efdebe2 IEFRAME!IUnknown_SetSite+0x33
02b1fb00 6efa790c IEFRAME!CIEFrameAutoProp::_VariantClear+0x26
02b1fb08 6efa78e9 IEFRAME!CIEFrameAutoProp::~CIEFrameAutoProp+0xa
02b1fb14 6efa78cb IEFRAME!CIEFrameAutoProp::`scalar deleting destructor'+0xd
02b1fb20 6efce5ac IEFRAME!CIEFrameAuto::_ClearPropertyList+0x19
02b1fb48 6efcfbf2 IEFRAME!CIEFrameAuto::SetOwner+0x184
02b1fb64 6efbdc22 IEFRAME!CBaseBrowser2::OnDestroy+0x88
02b1fb70 6efbdc41 IEFRAME!CCommonBrowser::OnDestroy+0x21
02b1fb80 6efcfb61 IEFRAME!CShellBrowser2::OnDestroy+0xf
02b1fb98 6efac10c IEFRAME!CBaseBrowser2::WndProcBS+0xb8
02b1fbb4 6efaba1d IEFRAME!CCommonBrowser::WndProcBS+0x2a
02b1fc1c 6efbac5c IEFRAME!CShellBrowser2::WndProcBS+0x18f

Ah ha! From the stack we can see that our test BHO with the bogus call to CoUninitialize() in the SetSite() implementation triggered the apartment to get torn down.

Now, it isn't always this easy. If this doesn't lead you to the problem I recommend doing some code inspection around places where your extension (knowingly) calls CoInitialize() and CoUninitialize(), and see if there are any edge cases where they could get out of balance.

If that’s not sufficient, or you have to do explicit matching to understand the root cause, you can put breakpoints on both the init and unint in the debugger, run through the simplest scenario that causes the crash, and then analyze the debug output. I usually set the following breakpoints:

bp ole32!CoInitializeEx “~.;k;g”
bp ole32!CoUninitialize “~.;k;g”

at the initial breakpoint, and let it go until the exception is hit. These will trace all of the calls. Then, in a text editor you can chop out all the calls from threads not on the thread that crashed and start matching them up. (Most of the calls will be obvious noise.) Make sure you either log to a file or have a large buffer.

To close, I want to call out that your extension should never, ever attempt to implement IInitializeSpy to compensate for an unbalanced init or uninit. First, you don't own the threadproc, and second it leads to nothing but trouble. In a future post I'll talk about an example. :-)