Kernel Debugging MFCMAPI


I noticed recently that one on of my machines, whenever I ran MFCMAPI and displayed one of my CEditor dialogs, the whole machine froze for 1 to several minutes. I figured this would be a fun opportunity to practice my kernel debugging skills and share what I learned.

A bit about my setup: I keep two physical machines in my office. On the larger machine, I run Hyper-V Server 2008. This headless box sits in the corner and hosts all my VMs, including the VM where I run Outlook and other day-to-day case and labor tracking tools on Vista. The smaller machine, running Windows Server 2008, is my file server and my primary development and debugging desktop. Both machines are x64 and many of my VMs are as well.

The machine where I saw the hang was my tools VM, so I figured I’d try a remote VM kernel debug from my development machine. I haven’t seen a good set of instructions for doing this with these operating systems, so I had to crib bits and pieces from various sources. The first step is to enable kernel debugging on the target machine. On Vista, this means using BcdEdit:

  1. Open an administrative command prompt.
  2. Run bcdedit /debug on. This enables kernel debugging using the default setting of COM1, which is what we need.

Next, we have to map a named pipe on the Hyper-V server to COM 1 on the VM. This named pipe is what we’ll connect to. The process is described here:

  1. Open Hyper-V Manager, select the VM to be debugged, and open Settings
  2. Select COM 1, choose Named pipe, and enter a name for the pipe. The name doesn’t really matter. I chose debug.

Now’s a good time to reboot the VM so both settings take effect.

If we try to skip ahead to the remote debug at this point, we won’t be able to connect. The reason is the firewall on the Hyper-V Server is going to block our attempts to hook up to the named pipe. Named pipes use the file server ports 139 and 445, so we just need to open those ports:

  1. This is tricky since Hyper-V server is command line only. We can open a Remote Desktop to the server, or hookup a keyboard and monitor and log on, but either way all we’ll have is the command line.
  2. Run the following two commands to open ports 139 and 445:
    netsh advfirewall firewall add rule name="File Server (TCP 139)" localport=139 dir=in action=allow protocol=tcp
    netsh advfirewall firewall add rule name="File Server (TCP 445)" localport=445 dir=in action=allow protocol=tcp

Now we’re ready to remote debug.

The following command, run from our remote machine, attaches the debugger (hyperboo is the name of my Hyper-V server, and Debug is the named pipe we attached to COM 1 on the target VM):

windbg -k com:pipe,port=\\hyperboo\pipe\Debug,resets=0,reconnect

Note that the debugger doesn’t break in right away – it just sits there saying "Waiting to reconnect…". We run through the steps to hang the machine (start MFCMAPI, open Other/Options will do), then switch over to the debugger and hit Ctrl+Break to attach. Now the fun part begins. We point at the public NT symbols, then append the path where MFCMAPI’s symbols are located:

  1. .sympath srv*c:\symbols*http://msdl.microsoft.com/download/symbols
  2. .sympath+ c:\mfcmapi\Release
  3. .reload

We run !locks to see what’s holding up the system. This command runs for a very long time as it scans memory for every lock:

1: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks...
Resource @ 0xfffffa80024ac0d0    Shared 1 owning threads
     Threads: fffffa80018cb290-01<*> 
KD: Scanning for held locks....................... 

Resource @ 0xfffffa8002423ab0    Exclusively owned

Contention Count = 3017959
NumberOfExclusiveWaiters = 8
Threads: fffffa800255a3c0-01<*>
Threads Waiting On Exclusive Access:
fffffa8003e57bb0 fffffa8001f326b0 fffffa800399fbb0 fffffa800356a8c0
fffffa8003ab5520 fffffa8003f1cbb0 fffffa800348abb0 fffffa80033a9060

KD: Scanning for held locks.

Resource @ 0xfffffa800324d8c0 Exclusively owned
Contention Count = 538
Threads: fffffa8003dd69f0-01<*>
KD: Scanning for held locks.

Resource @ 0xfffffa800250f3b0 Exclusively owned
Contention Count = 266255
NumberOfExclusiveWaiters = 2
Threads: fffffa8003dd69f0-01<*>
Threads Waiting On Exclusive Access:
fffffa800255a3c0 fffffa80032c0390

KD: Scanning for held

locks...
13422 total locks, 4 locks currently held

We find that lock 0xfffffa800324d8c0 has a contention count of 538, meaning it is blocking 538 threads! We then use the -v option to get more information on the lock:

1: kd> !locks -v 0xfffffa800324d8c0 

Resource @ 0xfffffa800324d8c0    Exclusively owned
    Contention Count = 538
     Threads: fffffa8003dd69f0-01<*> 

     THREAD fffffa8003dd69f0  Cid 134c.11ac  Teb: 000000007efdb000

Win32Thread: fffff900c2dfa010 RUNNING on processor 0
Not impersonating
DeviceMap fffff8800ef7d720
Owning Process fffffa8003afaab0 Image: MFCMapi.exe
Attached Process N/A Image: N/A
Wait Start TickCount 6762745 Ticks: 28 (0:00:00:00.437)
Context Switch Count 758 LargeStack
UserTime 00:00:00.828
KernelTime 00:00:04.281
Win32 Start Address 0x0000000000eb318b
Stack Init fffffa60035ecdb0 Current fffffa60035ebd30
Base fffffa60035ed000 Limit fffffa60035e2000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP RetAddr Call Site
fffffa60`035ec0c8 fffff960`000c5ba8 win32k!vAlphaPerPixelOnly+0x36
fffffa60`035ec0d0 fffff960`000c57a3 win32k!AlphaScanLineBlend+0x304
fffffa60`035ec190 fffff960`00259509 win32k!EngAlphaBlend+0x4f3
fffffa60`035ec440 fffff960`00259a2a win32k!NtGdiUpdateTransform+0x1075
fffffa60`035ec4f0 fffff960`00259b47 win32k!NtGdiUpdateTransform+0x1596
fffffa60`035ec610 fffff960`00259d05 win32k!EngNineGrid+0xb3
fffffa60`035ec6b0 fffff960`0025a17c win32k!EngDrawStream+0x1a1
fffffa60`035ec770 fffff960`0025ab35 win32k!NtGdiDrawStreamInternal+0x440
fffffa60`035ec820 fffff960`002737fc win32k!GreDrawStream+0x98d
fffffa60`035ecac0 fffff800`0189cdf3 win32k!NtGdiDrawStream+0x9c
fffffa60`035ecc20 00000000`75b8a59a nt!KiSystemServiceCopyEnd+0x13

1 total locks, 1 locks currently held

Note that MFCMAPI.exe holds this lock, so we’re on the right track. We use .thread to switch our context to thread fffffa8003dd69f0, using the /P and /r switches to force the debugger to translate page table entries and reload symbols for the implicit process:

1: kd> .thread /P /r fffffa8003dd69f0
Implicit thread is now fffffa80`03dd69f0
Implicit process is now fffffa80`03afaab0
.cache forcedecodeptes done
Loading User Symbols
.....
Loading Wow64 Symbols
......................................

Now – we’re stuck. Normally, we’d expect to see the user mode stack below the switch to kernel mode. But we don’t see it here. Instead, we see the wow64 translation layer, since we’re running a 32 bit process on a 64 bit processor:

1: kd> kL
  *** Stack trace for last set context - .thread/.cxr resets it
Child-SP          RetAddr           Call Site
fffffa60`035ec0c8 fffff960`000c5ba8 win32k!vAlphaPerPixelOnly+0x36
fffffa60`035ec0d0 fffff960`000c57a3 win32k!AlphaScanLineBlend+0x304
fffffa60`035ec190 fffff960`00259509 win32k!EngAlphaBlend+0x4f3
fffffa60`035ec440 fffff960`00259a2a win32k!NtGdiUpdateTransform+0x1075
fffffa60`035ec4f0 fffff960`00259b47 win32k!NtGdiUpdateTransform+0x1596
fffffa60`035ec610 fffff960`00259d05 win32k!EngNineGrid+0xb3
fffffa60`035ec6b0 fffff960`0025a17c win32k!EngDrawStream+0x1a1
fffffa60`035ec770 fffff960`0025ab35 win32k!NtGdiDrawStreamInternal+0x440
fffffa60`035ec820 fffff960`002737fc win32k!GreDrawStream+0x98d
fffffa60`035ecac0 fffff800`0189cdf3 win32k!NtGdiDrawStream+0x9c
fffffa60`035ecac0 fffff800`0189cdf3 win32k!NtGdiDrawStream+0x9c
fffffa60`035ecc20 00000000`75b8a59a nt!KiSystemServiceCopyEnd+0x13
00000000`001ddda8 00000000`75d7a996 wow64win!NtGdiDrawStream+0xa
00000000`001dddb0 00000000`75d63688 wow64!Wow64SystemServiceEx+0xca
00000000`001de660 00000000`75d7ab46 wow64cpu!ServiceNoTurbo+0x28
00000000`001de6f0 00000000`75d7a14c wow64!RunCpuSimulation+0xa
00000000`001de720 00000000`77ac52d3 wow64!Wow64LdrpInitialize+0x4b4
00000000`001dec80 00000000`77ac5363 ntdll!LdrpInitializeProcess+0x14ac
00000000`001def30 00000000`77ab85ce ntdll! ?? ::FNODOBFM::`string'+0x1ff19
00000000`001defe0 00000000`00000000 ntdll!LdrInitializeThunk+0xe

So we try .thread again, this time passing the /w switch to change to the 32 bit context:

1: kd> .thread /P /r /w fffffa8003dd69f0
Implicit thread is now fffffa80`03dd69f0
Implicit process is now fffffa80`03afaab0
.cache forcedecodeptes done
Loading User Symbols
.....
Loading Wow64 Symbols
......................................
The context is partially valid. Only x86 user-mode context is available.
x86 context set
0: kd:x86> kbL 1d
ChildEBP          RetAddr           Args to Child                                         
0040e05c 7732891e e8011177 00000060 0040e1e4 GDI32!NtGdiDrawStream+0x15
0040e160 73a6276e e8011177 00000060 0040e1e4 GDI32!GdiDrawStream+0x436
0040e3dc 73a624f9 00486ff8 004856c8 00000001 uxtheme!CImageFile::DrawBackgroundDS+0x439
0040e458 73a623a0 00486ff8 02900968 e8011177 uxtheme!CImageFile::DrawImageInfo+0x1dd
0040e4a8 73a627f8 02900968 e8011177 00000002 uxtheme!CImageFile::DrawBackground+0x45
0040e4f8 73a64946 02900968 e8011177 00000001 uxtheme!DrawThemeBackground+0xcc
0040e590 73a6789b 34011934 0292725c e8011177 uxtheme!CThemeWnd::NcPaintCaption+0x11a
0040e5b0 73a62ea0 0292725c 00000000 00000000 uxtheme!OnDwpSetIcon+0x6b
0040e604 73a60e74 00000000 00000000 00000001 uxtheme!_ThemeDefWindowProc+0x15d
0040e620 777dc18b 001d02de 00000080 00000001 uxtheme!ThemeDefWindowProcA+0x18
0040e668 777f7fec 001d02de 00000080 00000001 USER32!DefWindowProcA+0x68
0040e680 777e6050 0110d2f0 00000000 00000080 USER32!DefWindowProcWorker+0x27
0040e6c8 777f7fe3 0110d2f0 00000000 00000080 USER32!DefDlgProcWorker+0x7f7
0040e6e8 777d8807 001d02de 00000080 00000001 USER32!DefDlgProcA+0x29
0040e714 777d8962 77ca3db0 001d02de 00000080 USER32!InternalCallWinProc+0x23
0040e78c 777dc62c 00000000 77ca3db0 001d02de USER32!UserCallWinProcCheckWow+0x109
0040e7c4 777dd163 77ca3db0 001d02de 00000080 USER32!CallWindowProcAorW+0xa9
0040e7e4 00e75ed8 77ca3db0 001d02de 00000080 USER32!CallWindowProcA+0x1b
0040e804 00e76074 00000080 00000001 1ee70f3d MFCMapi!CWnd::DefWindowProcA+0x44
0040e820 00e08cb2 00000080 00000001 1ee70f3d MFCMapi!CWnd::WindowProc+0x3b
0040e850 00e79210 00000080 00000001 1ee70f3d MFCMapi!CEditor::WindowProc+0x222
0040e8b8 00e7929f 00000000 001d02de 00000080 MFCMapi!AfxCallWndProc+0x9a
0040e8d8 777d8807 001d02de 00000080 00000001 MFCMapi!AfxWndProc+0x36
0040e904 777d8962 00e79269 001d02de 00000080 USER32!InternalCallWinProc+0x23
0040e97c 777dc4b6 00000000 00e79269 001d02de USER32!UserCallWinProcCheckWow+0x109
0040e9c0 777deae2 0110d2f0 00000000 00e79269 USER32!SendMessageWorker+0x55b
0040e9e4 00e08f87 001d02de 00000080 00000001 USER32!SendMessageA+0x7f
0040edbc 00e74666 0040edf0 777d8807 001d02de MFCMapi!CEditor::OnInitDialog+0x167
0040edc4 777d8807 001d02de 00000110 001d02de MFCMapi!AfxDlgProc+0x31

Now we can piece together what happened. Down in the kernel layer, we see we’re doing something with graphics.Scanning back up the user mode stack, we find that MFCMapi!CEditor::OnInitDialog has called SendMessage, which is then handled by uxtheme. The message we’re passing is 0x80, which is WM_SETICON. Back in the MFCMAPI code, we see we’re on the line:

	SetIcon(m_hIcon, TRUE);			// Set big icon

Now it’s all starting to come together. In the process of setting up out dialog, we call WM_SETICON twice, once to set the big icon and once to set the small icon. But this is a dialog, so the big icon isn’t needed at all. When we remove this call, and just leave the call to set the small icon, dialogs open rapidly! This fix will be in the next release of MFCMAPI.

We learned three things in this process. First – setting up a live kernel debug to a Hyper-V VM isn’t all that hard once you know how to do it. Second, it is possible for ordinary user mode operations to hang in kernel mode. And last, copying and pasting code around without understanding what it does is never a good idea.

Comments (7)

  1. Steve,

    I am probably missing something, but how about a bigger question: why would setting an icon (even if it is not used) lock anything?

    Does that happen for any applicaiton? Or a one using MAPI?

  2. I didn’t dig into exactly where the lock was acquired, but I assume it was somewhere in the GDI layer – and what should have been a quick calculation is for some reason taking a really long time. BTW – I should have included this: my initial suspicion was that it was the new icon that caused the problem. And indeed, when I reverted to the old icon the problem got better. But it didn’t go away completely. I suspect the hang is actually a series of hangs, one for each size image contained in the icon. The new icon has many more images in it, so it hung longer.

    And no – I don’t think this had anything at all to do with MAPI.

  3. Daniel says:

    Guess I should stop using MFCMAPI on servers :))

  4. Or wait until I get the fix out. I’m hoping that’ll be some time this month.

  5. The March 2009 Release (build 6.0.0.1012) is live: http://mfcmapi.codeplex.com . Note the new URL – the

  6. dan_j_thompson@hotmail.com says:

    You mentioned that one lock had a contention count of 538, which seemed pretty high, but what about those other locks? (3017959 and 266255!)

    I’m guessing the contention count is somehow not valid for those locks, but I don’t understand how to tell why other than "that can’t be right". Any explanation?

  7. Actually, we can exclude them for another reason. 0xfffffa8002423ab0 (count of 3017959) is held by thread fffffa800255a3c0, which is waiting on 0xfffffa800250f3b0. 0xfffffa800250f3b0 (count of 266255) is held by thread fffffa8003dd69f0, which is the thread we analyzed. So it doesn’t matter if those counts are right or not, we only had one thread of interest to analyze.