Today I want to talk about tracking down leaking pool. Back with Server 2003 and before, leaking pool was a major issue because it was a limited resource. In Vista and beyond, it isn’t as much of an issue since pool is allocated dynamically, but it can still cause system performance issues if a component uses too much of it. Since all nonpaged pool allocations are locked in RAM, it leaves the system with a smaller amount of RAM for anything else.
I recently worked a case involving leaking nonpaged pool and it was easy to troubleshoot. I thought I would walk through the steps I used to narrow down the issue in this article. So let's get started. Note - The following can be done on a live machine or with a kernel memory dump as well.
Running the !vm command, you can get statistics about the virtual memory usage on the system.
Virtual Memory Usage
Physical Memory: 655219 ( 2620876 Kb)
Page File: \??\C:\pagefile.sys
Current: 4190208Kb Free Space: 4115148Kb
Minimum: 4190208Kb Maximum: 4190208Kb
Available Pages: 303760 ( 1215040 Kb)
ResAvail Pages: 571159 ( 2284636 Kb)
Modified Pages: 677 ( 2708 Kb)
NonPagedPool Usage: 59613 ( 238452 Kb)
NonPagedPool Max: 69377 ( 277508 Kb)
NonPaged Pool Usage : 85%
********** Excessive NonPaged Pool Usage *****
PagedPool 0 Usage: 31684 ( 126736 Kb)
PagedPool 1 Usage: 1824 ( 7296 Kb)
PagedPool 2 Usage: 1821 ( 7284 Kb)
PagedPool 3 Usage: 1824 ( 7296 Kb)
PagedPool 4 Usage: 1827 ( 7308 Kb)
PagedPool Usage: 38980 ( 155920 Kb)
PagedPool Maximum: 86016 ( 344064 Kb)
Paged Pool Usage : 45%
Hmm….looks like something is using a lot of nonpaged pool. To determine what is using the maximum amount of nonpaged pool, run the !poolused command.
0: kd>!poolused 2
Sorting by NonPaged Pool Consumed
Tag Allocs Used Allocs Used
Thre 326514 208968960 0 0
MmCm 647 2830624 0 0
LSwi 1 2293760 0 0
Npfs 1681 1276672 1202 208320
File 5863 945472 0 0
Dmio 131 525760 0 0
Irp 1158 508800 0 0
Adding a value of 2 to end of the command instructs the debugger to sort by nonpaged pool used. For more information on the command, please review the help file.
According to the output, the “Thre” tag is the highest user of nonpaged pool. To get a rough idea of all the modules that might be allocating pool using this tag, we can search all the modules to see if they contain the string. This can help narrow down the problem to a few drivers. To search all the modules, just use the for_each_module command.
!for_each_module s -a @#Base @#End "Thre"
Look at David Butler’s blog post if you want to narrow it down even further by looking at each driver’s import table to see if they are importing the ExAllocatePool or ExAllocatePoolWithTag functions. Though, in this case I already know the “Thre” tag is used by the kernel to allocate thread objects. Could the kernel be leaking thread objects? Most likely not, otherwise we would have seen a lot of customers reporting this issue. I am not ruling it out, but to start off, I am going to focus on the third party software installed on the machine.
Let’s dump out a few sample pool allocations, to see what exactly they are. I was sure there were going to be a ton of them, so I just hit the Break button once the output started flying by.
0: kd> !poolfind Thre 0
Scanning large pool allocation table for Tag: Thre
Searching NonPaged pool (81093000 : 89093000) for Tag: Thre
81098000 size: 280 previous size: 0 (Allocated) Thre (Protected)
81098380 size: 280 previous size: 100 (Allocated) Thre (Protected)
81098600 size: 280 previous size: 280 (Allocated) Thre (Protected)
81098880 size: 280 previous size: 280 (Allocated) Thre (Protected)
81098b00 size: 280 previous size: 280 (Allocated) Thre (Protected)
81098d80 size: 280 previous size: 280 (Allocated) Thre (Protected)
81099000 size: 280 previous size: 0 (Allocated) Thre (Protected)
810992c0 size: 280 previous size: 40 (Allocated) Thre (Protected)
810996e0 size: 280 previous size: 1a0 (Allocated) Thre (Protected)
81099960 size: 280 previous size: 280 (Allocated) Thre (Protected)
81099d80 size: 280 previous size: 1a0 (Allocated) Thre (Protected)
8109a000 size: 280 previous size: 0 (Allocated) Thre (Protected)
8109a380 size: 280 previous size: 100 (Allocated) Thre (Protected)
8109a600 size: 280 previous size: 280 (Allocated) Thre (Protected)
8109a880 size: 280 previous size: 280 (Allocated) Thre (Protected)
It would be interesting to see the thread stacks for the KTHREAD structures in these pools. To dump out the thread using the !thread command, I needed to figure out where the base of the KTHREAD structure started in the pool. To figure out the offset I dumped out all the threads in the System process, picked a random thread, and then calculated the thread’s offset from the start of it’s pool block. You can use any thread in the entire system because the offset will always be the same.
0: kd> !thread 8904a180
THREAD 8904a180 Cid 8.18 Teb: 00000000 Win32Thread: 00000000 WAIT: (WrEventPairLow) KernelMode Non-Alertable
Owning Process 890754a0
Wait Start TickCount 3831270 Elapsed Ticks: 209711
Context Switch Count 28238
Start Address nt!ExpWorkerThread (0x80417a9a)
Stack Init f6444000 Current f6443d34 Base f6444000 Limit f6441000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
f6443d4c 8042ef5f bfec0068 80478d40 8904a180 nt!KiSwapThread+0x1b1
f6443d70 80417b0d 00000000 80442f00 00000000 nt!KeRemoveQueue+0x197
f6443da8 804578c2 00000000 00000000 00000000 nt!ExpWorkerThread+0x73
f6443ddc 8046c966 80417a9a 00000000 00000000 nt!PspSystemThreadStartup+0x54
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
0: kd> !pool 8904a180
8904a080 size: e0 previous size: 20 (Allocated) Dire (Protected)
*8904a160 size: 280 previous size: e0 (Allocated) *Thre (Protected)
8904a3e0 size: 280 previous size: 280 (Allocated) Thre (Protected)
0: kd> ?8904a180-8904a160
Evaluate expression: 32 = 00000020
So based on my calculations, the Kthread structure starts 32 bytes into the Thre pool.
Now I can use this offset to dump threads stacks from the pools returned by the poolfind command. Some of the threads are going to be valid, but most of them are going to be doing something fishy. Basically I need to use a large sample to figure out what would be causing the Thre tag to leak. After a few tries, I was able to find a lot of threads which looked like this -
kd> !Thread 81098620
THREAD 81098620Cid 8e8.a391c Teb: 00000000 Win32Thread: 00000000 TERMINATED
Owning Process 886f5020
Wait Start TickCount 1949088 Elapsed Ticks: 2091893
Context Switch Count 2
Start Address 0x7c57b740
Win32 Start Address 0x20003474
Stack Init 0 Current bca4fc44 Base bca50000 Limit bca4d000 Call 0
Priority 16 BasePriority 8 PriorityDecrement 0 DecrementCount 0
kd> !PROCESS 886f5020 0
PROCESS 886f5020SessionId: 0 Cid: 08e8 Peb: 7ffdf000 ParentCid: 016c
DirBase: 694bc000 ObjectTable: 88b3ec68 TableSize: 391189.
Looks like the thread is terminated. But why is it still hanging around?
0: kd> !object 81098620
Object: 81098620 Type: (89075900) Thread
HandleCount: 1 PointerCount: 1
Aha! Since the handle count and pointer count are not 0, the system has not freed this Thre allocation. Most likely something has an open handle to this object. Dumping out all the processes using the !process command, I found a particular process with 388,836 open handles. This is the culprit. Interesting enough, this is the same process which was listed in the thread’s process field.
388386 Handles in process
Running !handle 0 2 886f5020 and analyzing the output, I found a ton of thread handles. Most likely the process is leaking thread handles and preventing the system from cleaning up the thread allocations. This is resulting in high usage of nonpaged pool, which is causing the performance issues seen by the customer.
We contacted the software vendor responsible for the process, and it turned out to be a known bug in their software. An update was provided to the customer, and it fixed the issue immediately.