Don't believe everything the debugger is telling you!!! (aka Rootkit)

Article
07/19/2004

<sigh>

So this issue was incredibly interesting, but you know at the end it just makes you mad. This past week a customer was having a very clear cut, reproduciable issue: When attempting to attach an item to a message via OWA, the Information Store would crash with what appeared to be stack corruption. Boy, I didn’t know where this would take me and everyone else that got drawn into this issue. Below is our debug journey to understand what was happening. The conclusion of this was a collective effort of several Escalation Engineers, so I am not taking full credit for all of this investigation… The information below may make it look easy, but it had us going in an infinite loop for a couple of hours <grin>

First the stack looked like this after the crash (yes I had correct symbols in place):

0:058> kb

ChildEBP RetAddr Args to Child

WARNING: Frame IP not in any known module. Following frames may be wrong.

7c576b4e f8458d51 0875ff50 ffffb7e8 0fc085ff 0x77f87cac

So based upon my past experience with Stack corruption, I looked at raw dump of the stack which showed over and over an exception handler was attempting to be located, which chewed up the remnants of the original stack. Fortunately the customer allowed me to perform a remote debug, so I didn’t have to decipher the mess left on the stack or attempt to find the real .cxr (Exception Record), besides there where errors when this userdump was created related to not being able to read memory… so we really couldn’t trust everything in the userdump. In turns out there was a valid reason for this error...

So after manually walking through the functions that are called, I eventually found the crash was occurring in this code path:

ChildEBP RetAddr
25b0cf10 001f3cae ESE!ErrOSSLVFileOpen+0x61
25b0d68c 001edb85 ESE!ErrSLVSetColumnFromFile+0x5b
25b0dc84 001c4bc8 ESE!ErrSLVSetColumn+0x4d9
25b0dcc0 001c49a6 ESE!ErrIsamSetColumn+0x1ec
25b0dcd8 001c48f1 ESE!ErrESESetColumn+0x6f
25b0dd10 001c4885 ESE!JetSetColumnEx+0x4d
25b0dd54 004041d1 ESE!JetSetColumn+0x4c
25b0dd84 0040f710 store!JetSetColumnFn+0x4a
25b0ddc8 004a78ba store!JTAB_BASE::EcSetColumn+0x1aa
25b0de08 004a50ea store!ATTACH::EcFlushFileHandleProps+0x94
25b0de9c 004a45c9 store!ATTACH::EcFlushProps+0xac7
25b0df18 004a42d6 store!ATTACH::EcSaveChanges2+0x4a1
25b0df30 004a429a store!EcSaveOneAttach+0x19
25b0df58 213bb966 store!EcSaveChangesAttachOp2+0x81
25b0df6c 213bb7c2 EXOLEDB!CStoreLogon::ScSaveChangesAttach+0x42
25b0e030 213bb43d EXOLEDB!CWMMultiPartFormStream::ScCommitAttachment+0x1c6
25b0e06c 213c0233 EXOLEDB!CWMMultiPartFormStream::FWrite+0x9b
25b0efe0 2138ddb0 EXOLEDB!CRequestStream::ScProcessContinuation+0x8e
25b0f014 2138d9c9 EXOLEDB!ScDispatchRequest+0x1e1
25b0fec8 2138d88b EXOLEDB!CDavServer::ProcessItem+0x94

So in this code path we are taking in the information about the file and creating a file handle by calling NTCreateFile(). This is the call that causes the crash to occur. Simply stepping into this call instruction the store will crash. This means the problem is somewhere in NtCreateFile.

0:014>r
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=001f429f esp=20b9cb54 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ESE!ErrOSSLVFileOpen+0x3b7:
001f429f ff159c253200 call dword ptr [ESE!pfnNtCreateFile (0032259c)]{ntdll!ZwCreateFile (77f87cac)} ds:0023:0032259c=77

OK, so I dumped the parameters about to be passed to NTCreateFile (not shown) and nothing looked out of the ordinary. Next was to unassemble NtCreateFile and see what is about to happen:

0:014> u eip
ntdll!ZwCreateFile:
77f87cac b820000000 mov eax,0x20
77f87cb1 8d542404 lea edx,[esp+0x4]
77f87cb5 cd2e int 2e
77f87cb7 c22c00 ret 0x2c

Still nothing out of the ordinary in comparison to a working system, so stepping through NTCreateFile actually gets me into ZwCreateFile (the backend function that is really called). Here are the register states as we sit poised at the first instruction of ZwCreateFile.

0:014> t
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=77f87cac esp=20b9cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ntdll!ZwCreateFile:
77f87cac b820000000 mov eax,0x20

So I executed the first assembly instruction of ZwCreateFile, but $#%* what happened? EIP didn’t advance to the next instruction as expected, there is a weird value for EIP now, and the values for all the other registers where exactly the same. We didn’t skip ahead and execute the int 2e instruction to cause a kernel mode transition so what the heck happened?

0:014> p
eax=20b9cb90 ebx=80000000 ecx=20b9cbc0 edx=20b9cba8 esi=00000000 edi=0000004b
eip=7ffa488d esp=20b9cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
7ffa488d ?? ???

So what instruction(s) are these?

0:014> u eip
7ffa488d ?? ???
^ Memory access error in 'u eip'

Why couldn’t we read this memory? This is a live debug, it should be readable regardless if it is mapped in the process or not. It turns out this is the reason why the userdump could not be created successfully. So I reset the repro and stepped through ESE!ErrOSSLVFileOpen again to ensure we didn’t miss anything… and of course we found nothing. At this point the impossible was happening... a benign instruction was blowing up. So its time to get a different perspective on this, so I asked a few people to give me their thoughts... well, a few hours later there was 8 people huddled over my desk all scratching their heads.

Hmmm… time to breakout the kernel debugger J

OK, so once again we walked to the point in assembly of the instruction that failed and had the customer generate a kernel dump.

So now looking at the kernel dump lets first find our process (note output is truncated for brevity):

: kd> !process 0 0

**** NT ACTIVE PROCESS DUMP ****

PROCESS ff629400 SessionId: 0 Cid: 0f54 Peb: 7ffdf000 ParentCid: 0130

DirBase: 041eb000 ObjectTable: 81432428 TableSize: 3056.

Image: store.exe

PROCESS fd43e1e0 SessionId: 0 Cid: 0f5c Peb: 7ffdf000 ParentCid: 0df0

DirBase: 046c7000 ObjectTable: 811a54c8 TableSize: 265.

Image: cdb.exe

Set the context to the store.exe process:

0: kd> !process ff629400 1

PROCESS ff629400 SessionId: 0 Cid: 0f54 Peb: 7ffdf000 ParentCid: 0130

DirBase: 041eb000 ObjectTable: 81432428 TableSize: 3056.

Image: store.exe

VadRoot fd434fe8 Clone 0 Private 6435. Modified 445462. Locked 67.

DeviceMap 8208c128

Token e2c4a470

ElapsedTime 0:14:19.0750

UserTime 0:00:07.0809

KernelTime 0:00:04.0998

QuotaPoolUsage[PagedPool] 325628

QuotaPoolUsage[NonPagedPool] 207428

Working Set Sizes (now,min,max) (58595, 50, 345) (234380KB, 200KB, 1380KB)

PeakWorkingSetSize 61146

VirtualSize 560 Mb

PeakVirtualSize 561 Mb

PageFaultCount 340776

MemoryPriority BACKGROUND

BasePriority 8

CommitCharge 9763

DebugPort e2602040

Now lets go and see the state of the memory range that could not be read from usermode (truncated output):

0: kd> !vad fd434fe8

VAD level start end commit

<snip>

fb8882e8 ( 7) 7ffa0 7ffa4 5 Private EXECUTE_READWRITE

So can we dump this memory? You bet…

0: kd> dd 7ffa0000

7ffa0000 000000e8 b62d5800 c300405d 3d2d2e5f

7ffa0010 6361485b 2072656b 65666544 7265646e

7ffa0020 2e2d3d5d 0000005f 00000000 00000800

7ffa0030 72656b00 336c656e 6c642e32 6553006c

7ffa0040 73614c74 72724574 4300726f 74616572

7ffa0050 69614d65 6f6c736c 47004174 614d7465

7ffa0060 6c736c69 6e49746f 57006f66 65746972

7ffa0070 656c6946 61655200 6c694664 6c430065

What is the state of this entry in the Page Tables?

0: kd> !pte 7ffa0000

7FFA0000 - PDE at C03007FC PTE at C01FFE80

contains 0D634867 contains 035BC8C6

pfn d634 --DA--UWV not valid

Transition: 35bc

Protect: 6

0: kd> !pte 7ffd4000

7FFD4000 - PDE at C03007FC PTE at C01FFF50

contains 0D634867 contains 028070C2

pfn d634 --DA--UWV not valid

PageFile 1

Offset 2807

Protect: 6

OK, so the value of ‘Protect’ explains why we can access this memory from usermode, but why? This condition isn’t normal. Hmmm… lets look at the memory again using dc command:

0: kd> dc 7ffa0000

7ffa0000 000000e8 b62d5800 c300405d 3d2d2e5f .....X-.]@.._.-=

7ffa0010 6361485b 2072656b 65666544 7265646e [Hacker Defender

7ffa0020 2e2d3d5d 0000005f 00000000 00000800 ]=-._...........

7ffa0030 72656b00 336c656e 6c642e32 6553006c .kernel32.dll.Se

7ffa0040 73614c74 72724574 4300726f 74616572 tLastError.Creat

7ffa0050 69614d65 6f6c736c 47004174 614d7465 eMailslotA.GetMa

7ffa0060 6c736c69 6e49746f 57006f66 65746972 ilslotInfo.Write

7ffa0070 656c6946 61655200 6c694664 6c430065 File.ReadFile.Cl

0: kd> dc

7ffa0080 4865736f 6c646e61 65470065 766e4574 oseHandle.GetEnv

7ffa0090 6e6f7269 746e656d 69726156 656c6261 ironmentVariable

7ffa00a0 65470057 646f4d74 46656c75 4e656c69 W.GetModuleFileN

7ffa00b0 41656d61 70754400 6163696c 61486574 ameA.DuplicateHa

7ffa00c0 656c646e 65724300 50657461 65636f72 ndle.CreateProce

7ffa00d0 00417373 74697845 65726854 43006461 ssA.ExitThread.C

7ffa00e0 74616572 72685465 00646165 61657243 reateThread.Crea

7ffa00f0 69506574 50006570 4e6b6565 64656d61 tePipe.PeekNamed

<lights and sirens go off>

Hacker Defender??? What the heck is that? (I’ll let you go dig around the web for that thing and remember to read the disclaimer on my blog before you do.)

So this is the thread waiting in the usermode debugger at the time the kernel dump was taken:

THREAD 8477f020 Cid f54.fd8 Teb: 7fee1000 Win32Thread: a217aea8 WAIT: (WrLpcReply) KernelMode Non-Alertable
SuspendCount 1
8477f208 Semaphore Limit 0x1
Waiting for reply to LPC MessageId 00034466:
Pending LPC Reply Message:
e2602041: [40000000,01000000]
Not impersonating
Owning Process ff629400
Wait Start TickCount 244909 Elapsed Ticks: 1690
Context Switch Count 1200 LargeStack
UserTime 0:00:00.0004
KernelTime 0:00:00.0006
Start Address KERNEL32!BaseThreadStartThunk (0x7c574333)
Win32 Start Address MDBTASK!MdbTaskPoolThread (0x61bd13e2)
Stack Init bd412000 Current bd41169c Base bd412000 Limit bd40f000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 DecrementCount 0

ChildEBP RetAddr Args to Child
bd4116b4 8042c413 8477f020 8477f1d8 8477f208 nt!KiSwapThread+0x1b1
bd4116dc 80434085 8477f208 00000011 00000000 nt!KeWaitForSingleObject+0x1a3
bd41170c 804c8113 e2602040 e2602040 bd411740 nt!LpcpRequestWaitReplyPort+0x52f
bd411720 80526c43 e2602040 bd4118c0 bd411740 nt!LpcRequestWaitReplyPort+0x13
bd4118a0 80526cf0 bd4118c0 e2602040 00000001 nt!DbgkpSendApiMessage+0x43
bd411938 8042f24a bd411d10 00000001 00000000 nt!DbgkForwardException+0x78
bd411cf4 80467735 bd411d10 00000000 bd411d64 nt!KiDispatchException+0x172
bd411d5c 804676d0 00000000 00000000 00000000 nt!CommonDispatchException+0x4d
bd411d64 00000000 00000000 00000000 00000000 nt!KiUnexpectedInterruptTail+0x1e1

bd411cf4 80467735 nt!KiDispatchException(
struct _EXCEPTION_RECORD * ExceptionRecord = 0xbd411d10,
struct _KTRAP_FRAME * ExceptionFrame = 0x00000000,
struct _KTRAP_FRAME * TrapFrame = 0xbd411d64,
char PreviousMode = 1 '',
unsigned char FirstChance = 0x01 '')+0x172

We can see this thread is hooked:

0: kd> .trap bd411d64
ErrCode = 00000000
eax=0e84cb90 ebx=80000000 ecx=0e84cbc0 edx=0e84cba8 esi=00000000 edi=00000716
eip=77f87cac esp=0e84cb50 ebp=7c576b4e iopl=0 nv up ei pl zr na po cy
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000247
ntdll!ZwCreateFile:
001b:77f87cac e9dccb0108 jmp 7ffa488d

Well, this isn’t the same code that usermode was showing us… recall this from the usermode debug at exactly the same address:

So, this means Hacker Defender was also intercepting calls made by the debugger during the live debug to read process memory and showing what would be expected at that address, not what was really happening… In kernel mode we can clearly see the instruction at 7ffa488d that attempting to be executed.. This address is in the range of 7ffa0000-7ffa4000 that we determined was unreadable from usermode.

The jmp that actually occurred placed the thread’s instruction pointer (EIP) pointing into the region of memory that could not be accessed from usermode prior to transitioning to kernel mode and this is why the information store was crashing. So it appears this rootkit has a bug …

TOO FUNNY!!!!

Don't believe everything the debugger is telling you!!! (aka Rootkit)

Additional resources