So we finished the first round of performance issues, crashes and memory leaks. This time we are going to dive in to a high CPU situation and I know that this is giving it away a little but before you go through the lab you might want to have a look at my GC Pop Quiz to familiarize yourself with how the GC works if you feel a bit shaky about it.
Previous demos and setup instructions
If you are new to the debugging labs, here you can find information on how to set up the labs as well as links to the previous labs in the series.
Reproducing the issue:
1. Recycle IIS (IISReset)
2. Browse to http://localhost/BuggyBits/AllProducts.aspx this should take about 5 seconds or more depending on what machine you’re on
3. Watch the CPU usage in taskmanager while browsing to this page
Q: What is the CPU usage like for w3wp.exe during the processing of this request?
Set up a performance counter log:
1. Check out Lab 3: Memory and set up a performance counter log per the instructions in there and start the log
Prepare to gather a memory dump:
1. Open a command prompt, move to the debuggers directory and type the following command but don’t hit enter
adplus -hang -pn w3wp.exe -quiet
Reproduce the issue again and gather the memory dump:
1. From a command prompt in the IIS Resources/Tinyget directory run the following tinyget command to spawn up 5 threads making one request each
tinyget -srv:localhost -uri:/BuggyBits/AllProducts.aspx -threads:5 -loop:1
2. When the CPU usage is high (in task manager), press enter in the adplus window to get the dump
3. Stop the performance monitor log
Open the dump to figure out what it is doing:
1. Open the dump (hang_mode with todays date and time in the debuggers directory) in windbg using file/open crash dump
2. Load up the symbols and sos
Verify that you took the dump at the right time:
1. Run !threadpool to see the CPU usage of the system to make sure that you got the dump at the right time
Q: What is the cpu usage?
Q: How does the CPU usage shown here correlate to the CPU usage of the process?
Q: If you ran tinyget you may see some requests queued up eg: Work Request in Queue: 4, why are requests queueing up? Why aren’t they scheduled on threads? (Compare Running worker threads and Maxlimit for worker threads)
Determine which threads have been consuming most of the CPU time
1. Run .time to see the uptime and CPU user time of the process
2. Run !runaway to see the usertime for all the threads
Q: From !runaway which threads are consuming most of the CPU?
Q: How much usermode CPU time was consumed by the process in total? (from .time)
Note: Looking at !runaway output can be a bit tricky for multiple reasons.
First off, on a multiproc machine you have to remember that the usermode time (i.e. clock cycles spent in usermode code) is CPU time spent on all processors, therefore the usermode time may add up to more than elapsed time (process uptime).
Secondly, the !runaway output shows the total usermode time spent on that thread since it was started. In asp.net for example a thread may be reused for multiple requests so a high usermode time on a thread does not neccesarily mean that the requests running on that thread is a big CPU consumer.
And finally, some threads like the GC threads (in multiproc, serverGC processes) stay alive for the duration of the process, so they have a higher chance of accumulating a lot of usermode time than worker threads, therefore it is really more interesting to look at two consecutive dumps and compare the differences in usermode time.
3. Pick the threads that have the highest usermode time and look at their stacks
~#s (set thread context, note # should be replaced with the appropriate thread number from !runaway)
kb 2000 (look at the native stack)
!clrstack (look at the .net stack)
Q: What are they doing? Can you make an assumption based on this as to what is causing the high CPU usage?
4. This is only applicable on multi-proc machines since they have dedicated GC threads… Sum up the usermode time for the GC threads in !runaway and divide this by total usermode time from .time.
Q: How much of the total usermode time is spent by the GC threads?
Examine the performance counter log:
1. Open up the performance counter log and add the counters for .NET CLR Memory/% Time in GC, # Gen 0 Collections, #Gen 1 Collections, #Gen 2 Collections and Large Object Heap Size.
Q: What does the % Time in GC counter measure?
Q: What is the average % Time in GC during the stress run?
Q: What is an appropriate number for % Time in GC?
Q: What is the ratio between Gen 0, Gen 1, and Gen 2 collections? What should it be in an optimal scenario and why?
Q: What can cause a ratio like this?
Look at the memory dump to figure out what caused the high CPU in GC:
1. Run ~* kb 2000 to get all native stacks and search for the thread that triggered the GC (mscorwks!SVR::GCHeap::GarbageCollectGeneration)
Q: Why was a gc triggered?
Q: What type of object was it allocating?
Q: What is the thread doing?
Q: Could this be causing the bad Gen 0, Gen 1, Gen 2 ratio? Why?
2. Find out what is on the large object heap (note: if you unfortunate you may have caught the GC in the plan or relocate phase in which case the !dumpheap output may not be correct. If that is the case try to make do with what you get from the stacks and !dumpheap -min 85000)
!dumpheap -min 85000
Q: What is on the large object heap?
Q: Is there any pattern in the sizes of the strings?
3. Dump out a few of the strings to see the contents
!do <address of string>
Q: Based on the stacks, the knowledge of the collection ratios and the string contents, can you determine what went wrong. You might want to do some internet searches on some of the calls on the stacks if you don’t know the answer immediately.
Verify your assumption in the code and modify the code to resolve the issue
1. Open AllProducts.aspx.cs and look at the code for Page_Load
Q: What line of code is causing the problem?
Q: How do you resolve it? Why does this resolve the problem?
2. Change the code so that it uses a stringbuilder instead of doing string concatenation and rerun the tests to see the results.
Note: Because there is a lot of looping to build the dataset the page may still take about half a second to a second to render but the CPU usage should be all gone and requests should not be queueing up.