How to track down High CPU in User Mode Applications - A live debug!

 

Written by Jeff Dailey.

 

Hello NTDebuggers, I’d like to talk about a common issue we deal with on a regular basis. We are often tasked with finding what functions are using CPU within a user mode process / application. Typically a user will find an application that is using more CPU then they expect it to, and this can affect overall system performance and responsiveness.   

 

For this exercise I’ve written some contrived sample code called EATCPU. It’s included at the bottom of the blog post. The following image of task manager shows EATCPU consuming 41% CPU time. A customer or user may tell you this does not “normally” happen. It’s always good to ask what “normal” is. In this case we will say normal is ~10%.

clip_image002[6]

The best case scenario is a live debug on the process that is running at high CPU levels. If you’re lucky enough to have a customer / user that will let you do a remote debug, and the problem reproduces on demand, you can take the following action.

 

You need to install the debugging tools for Windows, and set your symbol path. If at all possible acquire the symbols for the application you are debugging. We’ll assume that you are the expert that supports said program. If it’s written in house, get the symbols from the developer. If it’s from a third party, that vendor may be willing to provide you with symbols for their product. Microsoft has most of the symbols for our products available on our symbol server on the internet (!sympath srv*DownstreamStore*https://msdl.microsoft.com/download/symbols.)

 

The next thing is to attach windbg.exe to the process in question.

From the debuggers directory, type TLIST, this will list your process. Get the process id and then run WinDBG.EXE –p PROCESSID, or if your debugging a program like EATCPU, you can run WINDBG C:\program\EATCPU.EXE.

 

After the debugger is attached or the process is started in the debugger, reproduce the problem.

 

Microsoft (R) Windows Debugger Version 6.8.0001.0

Copyright (c) Microsoft Corporation. All rights reserved.

***** WARNING: Your debugger is probably out-of-date.

***** Check https://dbg for updates.

CommandLine: eatcpu.exe

Symbol search path is: srv*C:\symbols*\\symbols\symbols

Executable search path is:

ModLoad: 00400000 0041a000 eatcpu.exe

ModLoad: 779b0000 77b00000 ntdll.dll

ModLoad: 76780000 76890000 C:\Windows\syswow64\kernel32.dll

ModLoad: 62bb0000 62cd1000 C:\Windows\WinSxS\x86_microsoft.vc80.debugcrt_1fc8b3b9a1e18e3b_8.0.50727.762_none_24c8a196583ff03b\MSVCR80D.dll

ModLoad: 75cb0000 75d5a000 C:\Windows\syswow64\msvcrt.dll

(1090.164): Break instruction exception - code 80000003 (first chance)

eax=00000000 ebx=00000000 ecx=712b0000 edx=00000000 esi=fffffffe edi=77a90094

eip=779c0004 esp=0017faf8 ebp=0017fb28 iopl=0 nv up ei pl zr na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

ntdll!DbgBreakPoint:

779c0004 cc int 3

0:000> g

(1090.11d4): Break instruction exception - code 80000003 (first chance)

eax=7efa3000 ebx=00000000 ecx=00000000 edx=77a1d894 esi=00000000 edi=00000000

eip=779c0004 esp=0109ff74 ebp=0109ffa0 iopl=0 nv up ei pl zr na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

ntdll!DbgBreakPoint:

779c0004 cc int 3

0:007> .sympath SRV*c:\websymbols*https://msdl.microsoft.com/download/symbols

Symbol search path is: SRV*c:\websymbols*https://msdl.microsoft.com/download/symbols

0:007> g

(1090.17d4): Break instruction exception - code 80000003 (first chance)

eax=7efa3000 ebx=00000000 ecx=00000000 edx=77a1d894 esi=00000000 edi=00000000

eip=779c0004 esp=0109ff74 ebp=0109ffa0 iopl=0 nv up ei pl zr na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

ntdll!DbgBreakPoint:

779c0004 cc int 3

Once the problem has started run the !runaway command.

0:007> !runaway

 User Mode Time

  Thread Time

   2:c04 0 days 0:01:08.827 ß Note this thread, thread 2:c04 is using more CPU than any other.

   7:17d4 0 days 0:00:00.000 ß Note the other threads are using very little if any CPU.

   6:1a4c 0 days 0:00:00.000

   5:d20 0 days 0:00:00.000

   4:157c 0 days 0:00:00.000

   3:1b28 0 days 0:00:00.000

   1:1134 0 days 0:00:00.000

   0:164 0 days 0:00:00.000

0:007> ~2s ß Using the thread number 2, set the thread context with the ~Ns command.

*** WARNING: Unable to verify checksum for eatcpu.exe

eax=cccccccc ebx=00b93c48 ecx=0000002b edx=00b937a8 esi=00000000 edi=00d9fcf0

eip=0041169c esp=00d9fcd0 ebp=00d9fd9c iopl=0 nv up ei pl nz na po nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202

eatcpu!checkSomething+0x1c:

0041169c f3ab rep stos dword ptr es:[edi] es:002b:00d9fcf0=cccccccc

0:002> k ß Dump the call stack using k.

 

If you look at the following call stack, the applications code in this thread starts where you see EATCPU, the code before that is C Runtime code for begin thread. I want to trace all the code that is running under being thread. The assumption here is that I’ll find something looping and eating CPU. To do this I will use the WT command. However first I need to specify a beginning address for WT to start trace at.

Let's use UF to Unassemble the Function that started our code by taking the return address of eatcpu!myThreadFunction.

ChildEBP RetAddr

00d9fd9c 00411657 eatcpu!checkSomething+0x1c [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 49]

00d9fe74 004115a8 eatcpu!trySomething+0x27 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 45]

00d9ff58 62bb4601 eatcpu!myThreadFunction+0x38 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 35]

00d9ff94 62bb459c MSVCR80D!_beginthread+0x221

00d9ffa0 768019f1 MSVCR80D!_beginthread+0x1bc

00d9ffac 77a2d109 kernel32!BaseThreadInitThunk+0xe

00d9ffec 00000000 ntdll!_RtlUserThreadStart+0x23 [d:\vistartm\base\ntos\rtl\rtlexec.c @ 2695]

0:002> uf 004115a8 ß This command will unassemble the function at this address beginning to end.

0:007> uf 004115a8

eatcpu!myThreadFunction [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 30]:

   30 00411570 55 push ebp

   30 00411571 8bec mov ebp,esp

   30 00411573 81eccc000000 sub esp,0CCh

   30 00411579 53 push ebx

   30 0041157a 56 push esi

   30 0041157b 57 push edi

   30 0041157c 8dbd34ffffff lea edi,[ebp-0CCh]

   30 00411582 b933000000 mov ecx,33h

   30 00411587 b8cccccccc mov eax,0CCCCCCCCh

   30 0041158c f3ab rep stos dword ptr es:[edi]

   31 0041158e 8b4508 mov eax,dword ptr [ebp+8]

   31 00411591 8945f8 mov dword ptr [ebp-8],eax

eatcpu!myThreadFunction+0x24 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 33]:

   33 00411594 b801000000 mov eax,1

   33 00411599 85c0 test eax,eax

   33 0041159b 7410 je eatcpu!myThreadFunction+0x3d (004115ad)

eatcpu!myThreadFunction+0x2d [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 35]:

   35 0041159d 8b45f8 mov eax,dword ptr [ebp-8]

   35 004115a0 8b08 mov ecx,dword ptr [eax]

   35 004115a2 51 push ecx

   35 004115a3 e880faffff call eatcpu!ILT+35(?trySomethingYAHHZ) (00411028)

   35 004115a8 83c404 add esp,4

   36 004115ab ebe7 jmp eatcpu!myThreadFunction+0x24 (00411594)

eatcpu!myThreadFunction+0x3d [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 37]:

   37 004115ad 5f pop edi

   37 004115ae 5e pop esi

   37 004115af 5b pop ebx

   37 004115b0 81c4cc000000 add esp,0CCh

   37 004115b6 3bec cmp ebp,esp

   37 004115b8 e8a1fbffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e)

   37 004115bd 8be5 mov esp,ebp

   37 004115bf 5d pop ebp

   37 004115c0 c3 ret

0:002> wt -or 00411570 ß We will use WT to Watch Trace this code. I’ve selected the starting address of the myThreadFunction function. I’ve also specified –or to print the return value of each function. Wt produces very visual output. It allows you to quickly identify patterns in the way the code executes that would be much more difficult just using the T (TRACE) command. 

 

    8 0 [ 0] ntdll!RtlSetLastWin32Error eax = 0

>> No match on ret

    8 0 [ 0] ntdll!RtlSetLastWin32Error

    2 0 [ 0] eatcpu!checkSomething

    1 0 [ 1] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 1] eatcpu!_RTC_CheckEsp eax = 0

    9 3 [ 0] eatcpu!checkSomething

    1 0 [ 1] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 1] eatcpu!_RTC_CheckEsp eax = 0

   12 6 [ 0] eatcpu!checkSomething eax = 0

>> No match on ret

   12 6 [ 0] eatcpu!checkSomething

    7 0 [ 0] eatcpu!trySomething

    1 0 [ 1] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 1] eatcpu!_RTC_CheckEsp eax = 0

   10 3 [ 0] eatcpu!trySomething eax = 0

>> No match on ret

   10 3 [ 0] eatcpu!trySomething

    9 0 [ 0] eatcpu!myThreadFunction ß I see a pattern, a loop. Beginning of loop.

    1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ)

   60 0 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ)

   62 0 [ 2] eatcpu!checkSomething

    5 0 [ 3] kernel32!SetLastError

   16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0

   64 21 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   71 24 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   74 27 [ 2] eatcpu!checkSomething eax = 0

   67 102 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 2] eatcpu!_RTC_CheckEsp eax = 0

   70 105 [ 1] eatcpu!trySomething eax = 0

18 176 [ 0] eatcpu!myThreadFunction ßEnd of loop / beginning of loop

    1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ)

   60 0 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ)

   62 0 [ 2] eatcpu!checkSomething

5 0 [ 3] kernel32!SetLastError ßAlways look for what might be going wrong! Last error can give you a clue. We are setting last error at the low level of the loop

 16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0

   64 21 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   71 24 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

74 27 [ 2] eatcpu!checkSomething eax = 0 ßAlso note checkSomething is returning ZERO, this might indicate a problem. You need to look at the code or assembler.

   67 102 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 2] eatcpu!_RTC_CheckEsp eax = 0

   70 105 [ 1] eatcpu!trySomething eax = 0

27 352 [ 0] eatcpu!myThreadFunction ßEnd of loop / beginning of loop

    1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ)

   60 0 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ)

   62 0 [ 2] eatcpu!checkSomething

    5 0 [ 3] kernel32!SetLastError

   16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0

   64 21 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   71 24 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   74 27 [ 2] eatcpu!checkSomething eax = 0

   67 102 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 2] eatcpu!_RTC_CheckEsp eax = 0

   70 105 [ 1] eatcpu!trySomething eax = 0

36 528 [ 0] eatcpu!myThreadFunction ßEnd of loop / beginning of loop

    1 0 [ 1] eatcpu!ILT+35(?trySomethingYAHHZ)

   60 0 [ 1] eatcpu!trySomething

    1 0 [ 2] eatcpu!ILT+180(?checkSomethingYAHHZ)

   62 0 [ 2] eatcpu!checkSomething

    5 0 [ 3] kernel32!SetLastError

   16 0 [ 3] ntdll!RtlSetLastWin32Error eax = 0

   64 21 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   71 24 [ 2] eatcpu!checkSomething

    1 0 [ 3] eatcpu!ILT+345(__RTC_CheckEsp)

    2 0 [ 3] eatcpu!_RTC_CheckEsp eax = 0

   74 27 [ 2] eatcpu!checkSomething eax = 0

   67 102 [ 1] eatcpu!trySomething

12930 instructions were executed in 12929 events (0 from other threads)

Function Name Invocations MinInst MaxInst AvgInst

eatcpu!ILT+180(?checkSomethingYAHHZ) 69 1 1 1

eatcpu!ILT+345(__RTC_CheckEsp) 210 1 1 1

eatcpu!ILT+35(?trySomethingYAHHZ) 70 1 1 1

eatcpu!_RTC_CheckEsp 210 2 2 2

eatcpu!checkSomething 70 60 74 73

eatcpu!myThreadFunction 1 630 630 630

eatcpu!trySomething 71 10 70 68

kernel32!SetLastError 70 5 5 5

ntdll!RtlSetLastWin32Error 70 16 16 16

0 system calls were executed

eax=cccccccc ebx=00b93c48 ecx=00000002 edx=00b937a8 esi=00000000 edi=00d9fe6c

eip=0041164c esp=00d9fda8 ebp=00d9fe74 iopl=0 nv up ei pl nz na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206

eatcpu!trySomething+0x1c:

0041164c f3ab rep stos dword ptr es:[edi] es:002b:00d9fe6c=cccccccc

0:002> !gle ßNow that we have broken in let's check and see what the last error is using !GLE (Get Last Error) This dumps out the last error from the TEB.

LastErrorValue: (Win32) 0x57 (87) - The parameter is incorrect.

LastStatusValue: (NTSTATUS) 0xc000000d - An invalid parameter was passed to a service or function.

0:007> bp kernel32!SetLastError ßLets set a breakpoint on last error to examine what is going on in the function calling it.

0:007> g

Breakpoint 1 hit

eax=cccccccc ebx=00b93c48 ecx=00000000 edx=00b937a8 esi=00d9fcd0 edi=00d9fd9c

eip=767913dd esp=00d9fcc8 ebp=00d9fd9c iopl=0 nv up ei pl zr na pe nc

cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

kernel32!SetLastError:

767913dd 8bff mov edi,edi

0:002> kv ßGet the call stack

ChildEBP RetAddr

0:002> kv

ChildEBP RetAddr Args to Child

00d9fcc4 004116cb 00000057 00d9fe74 00000000 kernel32!SetLastError (FPO: [Non-Fpo]) ß0x57 Invalid parameter error, Why?

00d9fd9c 00411657 00000000 00d9ff58 00000000 eatcpu!checkSomething+0x4b (FPO: [Non-Fpo]) (CONV: cdecl) [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 57]

00d9fe74 004115a8 00000000 00000000 00000000 eatcpu!trySomething+0x27 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 45]

00d9ff58 62bb4601 0017ff34 4f9f12e9 00000000 eatcpu!myThreadFunction+0x38 (FPO: [Non-Fpo]) (CONV: cdecl) [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 35]

00d9ff94 62bb459c 00b937a8 00d9ffac 768019f1 MSVCR80D!_beginthread+0x221 (FPO: [Non-Fpo])

00d9ffa0 768019f1 00b937a8 00d9ffec 77a2d109 MSVCR80D!_beginthread+0x1bc (FPO: [Non-Fpo])

00d9ffac 77a2d109 00b93c48 00d926a6 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])

00d9ffec 00000000 62bb4520 00b93c48 00000000 ntdll!_RtlUserThreadStart+0x23 (FPO: [Non-Fpo]) (CONV: stdcall) [d:\vistartm\base\ntos\rtl\rtlexec.c @ 2695]

0:002> !error 00000057 ßdouble check, using !error, this will decode the error into a human readable string.

Error code: (Win32) 0x57 (87) - The parameter is incorrect.

0:002> uf checkSomething ßlets disassemble the function calling SetLastError.

eatcpu!checkSomething [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 49]:

   49 00411680 55 push ebp

   49 00411681 8bec mov ebp,esp

   49 00411683 81ecc0000000 sub esp,0C0h

   49 00411689 53 push ebx

   49 0041168a 56 push esi

   49 0041168b 57 push edi

   49 0041168c 8dbd40ffffff lea edi,[ebp-0C0h]

   49 00411692 b930000000 mov ecx,30h

   49 00411697 b8cccccccc mov eax,0CCCCCCCCh

   49 0041169c f3ab rep stos dword ptr es:[edi]

   50 0041169e 837d0800 cmp dword ptr [ebp+8],0 ßCheck what our first parameter is on the stack, EBP+8 remember PLUS == Parameters. Note looking at the stack it’s 00000000

   50 004116a2 741d je eatcpu!checkSomething+0x41 (004116c1) ßif parameter 1 ones 0 we are going to jump to this addres, else we execute the following code. (WE JUMP) eatcpu!checkSomething+0x24 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 52]:

   52 004116a4 8bf4 mov esi,esp ßThe green would have been the good code path, non errant.

   52 004116a6 68fa000000 push 0FAh

   52 004116ab ff15a8814100 call dword ptr [eatcpu!_imp__Sleep (004181a8)] ßNote we sleep or do some work other then eat CPU here if we are passed non ZERO

   52 004116b1 3bf4 cmp esi,esp

   52 004116b3 e8a6faffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e)

   53 004116b8 b801000000 mov eax,1 ßWe are setting EAX to 1, this means we have succeded

   53 004116bd eb15 jmp eatcpu!checkSomething+0x54 (004116d4) ßNow we jump to the clean up code for the fucntionn

eatcpu!checkSomething+0x41 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 57]:

   57 004116c1 8bf4 mov esi,esp ßThis appears to be a failure case. We did not get an expected parameter so we report an error and return zero.

   57 004116c3 6a57 push 57h ßPusing error 0x57 on the stack, invalid parameter.

   57 004116c5 ff15a4814100 call dword ptr [eatcpu!_imp__SetLastError (004181a4)] ßOur call to setlasterror

   57 004116cb 3bf4 cmp esi,esp

   57 004116cd e88cfaffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e)

   58 004116d2 33c0 xor eax,eax ßXORing eax with eax will make EAX Zero. This is an error condition.

eatcpu!checkSomething+0x54 [c:\source\eatcpu\eatcpu\eatcpu.cpp @ 60]:

   60 004116d4 5f pop edi

   60 004116d5 5e pop esi

   60 004116d6 5b pop ebx

   60 004116d7 81c4c0000000 add esp,0C0h

   60 004116dd 3bec cmp ebp,esp

   60 004116df e87afaffff call eatcpu!ILT+345(__RTC_CheckEsp) (0041115e)

   60 004116e4 8be5 mov esp,ebp

   60 004116e6 5d pop ebp

   60 004116e7 c3 ret

The key thing that should be observed by this scenario is that when dealing with a high CPU condition there is often some problem at the lower level of some loop condition that prevents the proper execution of code from happening. If you’re lucky the problem is reported by some error facility in the OS or the application. In either case you can use the above technique for isolation.

The following is the sample code for EATCPU.      

// eatcpu.cpp : Defines the entry point for the console application.

//

#include "stdafx.h"

#include <windows.h>

#include <process.h>

void myThreadFunction(void *);

int doSomething(int);

int trySomething(int);

int checkSomething(int);

int _tmain(int argc, _TCHAR* argv[])

{

      int truevalue = 1;

      int falsevalue = 0;

      // let's start some threads. These should represent worker threads in a process. Some of them will do “valid work” one will fail to do so.

      _beginthread(myThreadFunction,12000,(void *)&truevalue);

      _beginthread(myThreadFunction,12000,(void *)&truevalue);

      _beginthread(myThreadFunction,12000,(void *)&truevalue);

      _beginthread(myThreadFunction,12000,(void *)&falsevalue );

      _beginthread(myThreadFunction,12000,(void *)&truevalue);

      _beginthread(myThreadFunction,12000,(void *)&truevalue);

      Sleep(10*60000);

      return 0;

}

void myThreadFunction(void *value)

{

      int *localvalue = (int *)value;

      while(1)

      {

            trySomething(*localvalue);

      }

}

int doSomething(int value)

{

      return trySomething(value);

}

int trySomething(int value)

{

      return checkSomething(value);

}

int checkSomething(int value)

{

      if(value)

      { // Make sure we have have some valid input parameter.

        //We will pretend we are doing work, this could be anything , file I/O etc.

            Sleep(250);

            return TRUE;

      }

      else

      { // This is an error condition, this function expects an NON Zero parameter and will report ZERO as an invalid parameter.

            SetLastError(ERROR_INVALID_PARAMETER);

            return FALSE;

      }

}