My name is Nader Khonsari. I am an escalation engineer in Platforms Global Escalation Services. I want to share with you a recent experience where 64-bit Windows Server 2008 servers at a customer location were encountering bugcheck 0x109 blue screen crashes.
In 64-bit versions of the Windows kernel PatchGuard is present. If any driver or application attempts to modify the kernel the PatchGuard will generate the bugcheck (CRITICAL_STRUCTURE_CORRUPTION) mentioned below. PatchGuard protects the kernel from modification by malicious or badly written drivers or software.
To further investigate this bugcheck you need to compare the impacted kernel function with a known reliable one. For instance, if the machine encountering this was running Windows Server 2008 service pack 2 with a post SP2 hotfix kernel you need to compare the impacted kernel function with that of service pack 2 kernel function. Usually you do not need to download and extract the post SP2 hotfix, because the vast majority of the kernel code has not been modified since the service pack.
If you already have service pack 2 for Windows Server 2008 handy, expand the package using instructions included in KB928636:
expand.exe -f:* C:\WS08\SP2\windows6.0-kb948465-X64.cab C:\WS08\SP2\Expanded
Locate the kernel binary from the expanded binaries and then open it up with your debugger just like you open a crash memory dump.
windbg -z C:\WS08\SP2\Expanded\amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.0.6002.18005_none_ca3a763069a24eea\ntoskrnl.exe
This is the bugcheck data from the dump:
This bugcheck is generated when the kernel detects that critical kernel code or
data have been corrupted. There are generally three causes for a corruption:
1) A driver has inadvertently or deliberately modified critical kernel code
or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
2) A developer attempted to set a normal kernel breakpoint using a kernel
debugger that was not attached when the system was booted. Normal breakpoints,
"bp", can only be set if the debugger is attached at boot time. Hardware
breakpoints, "ba", can be set at any time.
3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.
Arg1: a3a039d89b456543, Reserved
Arg2: b3b7465eedc23277, Reserved
Arg3: fffff80001778470, Failure type dependent information
Arg4: 0000000000000001, Type of corrupted region, can be
0 : A generic data region
1 : Modification of a function or .pdata
2 : A processor IDT
3 : A processor GDT
4 : Type 1 process list corruption
5 : Type 2 process list corruption
6 : Debug routine modification
7 : Critical MSR modification
Next, check the address at Arg3. This will give you the function that was modified, but not the offset of the modified instruction.
3: kd> ln fffff80001778470
(fffff800`01778470) nt!KeSetSystemTime | (fffff800`01778790) nt!BiLoadSystemStore
nt!KeSetSystemTime = <no type information>
Unassemble the same function in the SP2 kernel binary you expanded from the SP2 package. Do the same with the function of the crashed kernel and compare the two. You will find the modified opcode compared to that of the unmodified kernel.
Below is the comparison of the nt!KeSetSystemTime code of the crashed kernel and that of the service pack 2 kernel respectively. They match fine except for the highlighted byte in the prefetch instruction which has been overwritten with a 0x1f. This changed the instruction to a nop, which is done to prevent the prefetch operation from occurring on processors that don't support prefetch.
fffff800`017785c6 0f1f0f nop dword ptr [rdi]
fffff800`017785c9 488b07 mov rax,qword ptr [rdi]
fffff800`017785cc 493bc7 cmp rax,r15
fffff800`017785cf 7516 jne nt!KeSetSystemTime+0x177
00000001`4012e5b6 0f0d0f prefetchw [rdi]
00000001`4012e5b9 488b07 mov rax,qword ptr [rdi]
00000001`4012e5bc 493bc7 cmp rax,r15
00000001`4012e5bf 7516 jne
After further investigation this turned out to be a known issue in the VMware environment when the VM is moved from a non-prefetch to a prefetch architecture and even then, only in a live-migration case. The issue is documented on VMWare's site at http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1008749&sliceId=1&docTypeID=DT_KB_1_1&dialogID=74787167&stateId=0 .