Debugging a Bugcheck 0x109

My name is Nader Khonsari. I am an escalation engineer in Platforms Global Escalation Services. I want to share with you a recent experience where 64-bit Windows Server 2008 servers at a customer location were encountering bugcheck 0x109 blue screen crashes.

 

In 64-bit versions of the Windows kernel PatchGuard is present. If any driver or application attempts to modify the kernel the PatchGuard will generate the bugcheck (CRITICAL_STRUCTURE_CORRUPTION) mentioned below. PatchGuard protects the kernel from modification by malicious or badly written drivers or software.

 

To further investigate this bugcheck you need to compare the impacted kernel function with a known reliable one. For instance, if the machine encountering this was running Windows Server 2008 service pack 2 with a post SP2 hotfix kernel you need to compare the impacted kernel function with that of service pack 2 kernel function. Usually you do not need to download and extract the post SP2 hotfix, because the vast majority of the kernel code has not been modified since the service pack.

 

If you already have service pack 2 for Windows Server 2008 handy, expand the package using instructions included in KB928636:

 

Windows6.0-KB948465-X64.exe /x

expand.exe -f:* C:\WS08\SP2\windows6.0-kb948465-X64.cab C:\WS08\SP2\Expanded

 

Locate the kernel binary from the expanded binaries and then open it up with your debugger just like you open a crash memory dump.

 

windbg -z C:\WS08\SP2\Expanded\amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.0.6002.18005_none_ca3a763069a24eea\ntoskrnl.exe

 

This is the bugcheck data from the dump:

 

CRITICAL_STRUCTURE_CORRUPTION (109)

This bugcheck is generated when the kernel detects that critical kernel code or

data have been corrupted. There are generally three causes for a corruption:

1) A driver has inadvertently or deliberately modified critical kernel code

 or data. See https://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx

2) A developer attempted to set a normal kernel breakpoint using a kernel

 debugger that was not attached when the system was booted. Normal breakpoints,

 "bp", can only be set if the debugger is attached at boot time. Hardware

 breakpoints, "ba", can be set at any time.

3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.

Arguments:

Arg1: a3a039d89b456543, Reserved

Arg2: b3b7465eedc23277, Reserved

Arg3: fffff80001778470, Failure type dependent information

Arg4: 0000000000000001, Type of corrupted region, can be

        0 : A generic data region

        1 : Modification of a function or .pdata

        2 : A processor IDT

        3 : A processor GDT

        4 : Type 1 process list corruption

        5 : Type 2 process list corruption

        6 : Debug routine modification

        7 : Critical MSR modification

 

Next, check the address at Arg3. This will give you the function that was modified, but not the offset of the modified instruction.

 

3: kd> ln fffff80001778470

(fffff800`01778470)   nt!KeSetSystemTime   |  (fffff800`01778790)   nt!BiLoadSystemStore

Exact matches:

    nt!KeSetSystemTime = <no type information>

 

Unassemble the same function in the SP2 kernel binary you expanded from the SP2 package. Do the same with the function of the crashed kernel and compare the two. You will find the modified opcode compared to that of the unmodified kernel.

 

Below is the comparison of the nt!KeSetSystemTime code of the crashed kernel and that of the service pack 2 kernel respectively. They match fine except for the highlighted byte in the prefetch instruction which has been overwritten with a 0x1f.  This changed the instruction to a nop, which is done to prevent the prefetch operation from occurring on processors that don't support prefetch.

 

nt!KeSetSystemTime+0x156:

fffff800`017785c6 0f1f0f          nop     dword ptr [rdi]

fffff800`017785c9 488b07          mov     rax,qword ptr [rdi]

fffff800`017785cc 493bc7          cmp     rax,r15

fffff800`017785cf 7516            jne     nt!KeSetSystemTime+0x177

 

ntoskrnl!KeSetSystemTime+0x156:

00000001`4012e5b6 0f0d0f          prefetchw [rdi]

00000001`4012e5b9 488b07          mov     rax,qword ptr [rdi]

00000001`4012e5bc 493bc7          cmp     rax,r15

00000001`4012e5bf 7516            jne    

 

After further investigation this turned out to be a known issue in the VMware environment when the VM is moved from a non-prefetch to a prefetch architecture and even then, only in a live-migration case.  The issue is documented on VMWare's site at https://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1008749&sliceId=1&docTypeID=DT_KB_1_1&dialogID=74787167&stateId=0 .