Corrupt Page Table Pages Caught in the MDL

Hello all, Scott Olson here again to share another interesting issue I worked on a while back.  The issue was that after upgrading to Windows XP Service Pack 2 the system would experience random bug checks with memory corruption. Interestingly, there was a very specific pattern to the corruption - it looked like a PFN address and flags were randomly placed into the page table page in several places in the process. The memory manager would never do this type of thing and I suspected that a driver was editing user page table pages, which should never be done.

Let's take a look at the stack:

kd> kb
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
f15b1308 80523096 c00862d8 10c5b000 00000000 nt!MiDeletePte+0x198
f15b13d0 80519776 000001d8 10d20fff 00000000 nt!MiDeleteVirtualAddresses+0x164
f15b13ec 805b1d74 10c20000 10d20fff f15b14a4 nt!MiDeleteFreeVm+0x20
f15b148c 8054060c ffffffff 049c6aa8 049c6ab0 nt!NtFreeVirtualMemory+0x42e
f15b148c 7c90eb94 ffffffff 049c6aa8 049c6ab0 nt!KiFastCallEntry+0xfc
03e4a398 7c90da54 7c8209b3 ffffffff 049c6aa8 ntdll!KiFastSystemCallRet
03e4a39c 7c8209b3 ffffffff 049c6aa8 049c6ab0 ntdll!NtFreeVirtualMemory+0xc

Here is the page table entry for the virtual address:

kd> !pte 10c5b000
VA 10c5b000
PDE at 00000000C0600430 PTE at 00000000C00862D8
contains 000000003FC6F867 contains 0000000015E0086F
pfn 3fc6f ---DA--UWEV pfn 15e00 ---DA-TUWEV

This shows that the value 15e0086f is incorrectly put into the page table pages. This bad value corresponds to a write-through mapping to a range allocated via a call to MmAllocatePagesForMdl.

c00862d0 00000000 00000000 15e0086f 00000000
c00862e0 00000000 00000000 00000000 00000000
c00862f0 00000000 00000000 00000000 00000000
c0086300 00000000 00000000 00000000 00000000
c0086310 00000000 00000000 00000000 00000000
c0086320 00000000 00000000 00000000 00000000
c0086330 00000000 00000000 00000000 00000000
c0086340 00000000 00000000 00000000 00000000
c0086350 00000000 00000000 15e0086f 00000000
c0086360 00000000 00000000 00000000 00000000
c0086370 00000000 00000000 00000000 00000000
c0086380 00000000 00000000 00000000 00000000
c0086390 00000000 00000000 00000000 00000000
c00863a0 00000000 00000000 00000000 00000000
c00863b0 00000000 00000000 00000000 00000000
c00863c0 00000000 00000000 00000000 00000000
c00863d0 00000000 00000000 00000000 00000000
c00863e0 00000000 00000000 00000000 00000000
c00863f0 00000000 00000000 00000000 00000000
c0086400 00000000 00000000 00000000 00000000
c0086410 00000000 00000000 00000000 00000000
c0086420 00000000 00000000 00000000 00000000
c0086430 00000000 00000000 00000000 00000000
c0086440 00000000 00000000 00000000 00000000
c0086450 00000000 00000000 00000000 00000000
c0086460 15e0086f 00000000 00000000 00000000
c0086470 00000000 00000000 00000000 00000000
c0086480 00000000 00000000 00000000 00000000
c0086490 00000000 00000000 00000000 00000000
c00864a0 00000000 00000000 00000000 00000000
c00864b0 00000000 00000000 00000000 00000000
c00864c0 00000000 00000000 00000000 00000000
c00864d0 00000000 00000000 00000000 00000000
c00864e0 15e0086f 00000000 00000000 00000000
c00864f0 00000000 00000000 00000000 00000000
c0086500 00000000 00000000 00000000 00000000
c0086510 00000000 00000000 00000000 00000000
c0086520 00000000 00000000 00000000 00000000
c0086530 00000000 00000000 00000000 00000000
c0086540 00000000 00000000 00000000 00000000
c0086550 00000000 00000000 00000000 00000000
c0086560 15e0086f 00000000 00000000 00000000
c0086570 00000000 00000000 00000000 00000000
c0086580 00000000 00000000 00000000 00000000
c0086590 00000000 00000000 00000000 00000000
c00865a0 00000000 00000000 00000000 00000000
c00865b0 00000000 00000000 00000000 00000000
c00865c0 00000000 00000000 00000000 00000000
c00865d0 00000000 00000000 00000000 00000000
c00865e0 00000000 00000000 00000000 00000000
c00865f0 00000000 00000000 00000000 00000000
c0086600 00000000 00000000 00000000 00000000
c0086610 00000000 00000000 00000000 00000000
c0086620 00000000 00000000 00000000 00000000
c0086630 00000000 00000000 00000000 00000000
c0086640 00000000 00000000 00000000 00000000
c0086650 00000000 00000000 00000000 00000000
c0086660 00000000 00000000 15e0086f 00000000
c0086670 00000000 00000000 00000000 00000000
c0086680 00000000 00000000 00000000 00000000
c0086690 00000000 00000000 00000000 00000000
c00866a0 00000000 00000000 00000000 00000000
c00866b0 00000000 00000000 00000000 00000000
c00866c0 00000000 00000000 00000000 00000000
c00866d0 00000000 00000000 00000000 00000000
c00866e0 00000000 00000000 15e0086f 00000000
c00866f0 00000000 00000000 00000000 00000000
c0086700 00000000 00000000 00000000 00000000
c0086710 00000000 00000000 00000000 00000000
c0086720 00000000 00000000 00000000 00000000
c0086730 00000000 00000000 00000000 00000000
c0086740 00000000 00000000 00000000 00000000
c0086750 00000000 00000000 00000000 00000000
c0086760 00000000 00000000 15e0086f 00000000
c0086770 00000000 00000000 00000000 00000000
c0086780 00000000 00000000 00000000 00000000
c0086790 00000000 00000000 00000000 00000000
c00867a0 00000000 00000000 00000000 00000000
c00867b0 00000000 00000000 00000000 00000000
c00867c0 00000000 00000000 00000000 00000000
c00867d0 00000000 00000000 00000000 00000000
c00867e0 00000000 00000000 00000000 00000000

kd> !pfn 15e00
PFN 00015E00 at address 81BCA800
flink 00000000 blink / share count 00000001 pteaddress 000AF001
reference count 0002 Cached color 0
restore pte 00000080 containing page FFEDCB Active RW
ReadInProgress WriteInProgress

The driver also has an outstanding call MmProbeAndLockPages call on the pages indicated by the reference count of 2. Thinking that this pfn value is incorrect I decided to search for this value and see what I could find.

kd> s -d 80000000 l?7fffffff 00015e00
8022d534 00015e00 0001f190 00041d50 0001f140 .^......P...@...
86cacbf4 00015e00 0000cd1c 0000cc27 0000cc08 .^......'.......
86e25cdc 00015e00 0a130005 e56c6946 00000000 .^......Fil.....

I found a few entries but the middle one looks like it could be an MDL allocation. So I verified this:

kd> !pool 86cacbf4 2
Pool page 86cacbf4 region is Nonpaged pool
*86cacbd0 size: 80 previous size: 28 (Allocated) *Mdl
Pooltag Mdl : Io, Mdls

Yes this is an MDL, let's inspect it:

kd> dt nt!_MDL 86cacbd8
+0x000 Next : (null)
+0x004 Size : 32
+0x006 MdlFlags : 138
+0x008 Process : (null)
+0x00c MappedSystemVa : 0x00004000
+0x010 StartVa : 0xf7baa000
+0x014 ByteCount : 0xfff
+0x018 ByteOffset : 0

Notice that the page 15e00 is in the MDL’s page list.

kd> dd 86cacbd8+1c
86cacbf4 00015e00 0000cd1c 0000cc27 0000cc08
86cacc04 0000cc09 0000cc0a 0000cc0b 0000cbec
86cacc14 0000cbed 0000cbee 0000cbef 0000cbd0
86cacc24 0000cbd1 0000cbd2 0000cbd3 0000cbd4
86cacc34 0000cbd5 0000cbd6 00000000 00000000
86cacc44 00000000 00000000 00000000 00010010

Next I wanted to see if I could find a driver that may have references to this MDL and I found two:

kd> s -d 80000000 l?7fffffff 86cacbd8
86f9c6a0 86cacbd8 0000003d 00000000 0000636a ....=.......jc..
86fc7e68 86cacbd8 00000001 00000001 00000000 ................

Now let's see who owns these

kd> !pool 86f9c6a0 2
Pool page 86f9c6a0 region is Nonpaged pool
*86f9c618 size: d8 previous size: 30 (Allocated) *Crpt
Pooltag Crpt : Memory corruption driver

kd> !pool 86fc7e68 2
Pool page 86fc7e68 region is Nonpaged pool
*86fc7e00 size: 98 previous size: 40 (Allocated) *Crpt
Pooltag Crpt : Memory corruption driver

This gives us a pretty convincing probability that this driver is at fault. So now you may ask, "Why did this problem only start after applying Service Pack 2?" By default when you install Server Pack 2, Data Execution Prevention (DEP) is enabled on systems that support it. The support for DEP is in the PAE kernel which uses extra bits to describe the page table entries. In this crash the solution was to disable DEP until the driver could be corrected. The driver was incorrectly using the memory mappings by ignoring the extra bits in the page number and causing the memory corruption by writing to the wrong page. For more information on default DEP settings and enabling/disabling it in Windows see the following article.

899298 The "Understanding Data Execution Prevention" help topic incorrectly states the default setting for DEP in Windows Server 2003 Service Pack 1
https://support.microsoft.com/default.aspx?scid=kb;EN-US;899298