ASP.NET Tips: How to use DebugDiag to track down where a performance problem is coming from


We recently had a case where the customer was having performance problems.  They were seeing requests take a few minutes to return and didn’t know what was happening.  So they took some dumps while the problem was happening.  So we ran this dump through DebugDiag using the Crash/Hang script and looked at the output.  The first thing we noticed was this:


1 client connection(s) in IIS_COM+__…Dump.dmp
have been executing a request for more than 90 seconds.

So we had a long running request.  So we looked to see what it was and saw:


Client connection from xxx.xxx.xxx.xxx:2449 to xxx.xxx.xxx.xxx:443
POST request for /xxxx/xxxx
HTTP Version HTTP/1.1
SSL Request False
Time alive 00:02:46
QueryString
Request mapped to
HTTP Request State HTR_READING_CLIENT_REQUEST
Native Request State NREQ_STATE_PROCESS

So we have a request that ran for 2 minutes and 46 seconds before the dump was taken.  So we go to dig into this and we see in the summary this:


The following threads in IIS_COM+__…Dump.dmp
are making a COM call to thread with system id 3760 in process 3428

( 74 77 )

1.92% of threads blocked


So these threads are calling another process.  Since it is a dllhost.exe process that we captured when we got the IIS hang dumps from DebugDiag, we can analyze that also.  With this dump, we do the same thing and find in the summary:


The following threads in IIS_COM+__…Dump.dmp
are waiting on data to be returned from another server via WinSock.

The call to WinSock originated from mseidrda!DRDASectionSetParameter+22b2
and is destined for port 5021 at IP address 10.254.1.12

( 10 11 )

1.53% of threads blocked


Looking at this thread we see:


Thread 10 – System ID 204
Entry point comsvcs!CSTAThread::WorkerLoop
Create time 4/25/2008 5:58:00 AM
Time spent in user mode 0 Days 00:01:20.734
Time spent in kernel mode 0 Days 00:00:17.578

This thread is waiting on data to be returned from another server via WinSock.

The call to WinSock originated from mseidrda!xxxx+22b2 and is destined for port 5021 at IP address xxx.xxx.xxx.xxx

Function Source
ntdll!KiFastSystemCallRet
ntdll!NtWaitForSingleObject+c
mswsock!SockWaitForSingleObject+19d
mswsock!WSPRecv+203
ws2_32!WSARecv+77
wsock32!recv+31
mseidrda!xxxx+22b2
mseidrda!xxxx+2920

Socket properties:
Source Port 1058
Destination IP xxx.xxx.xxx.xxx
Destination Port 5021


So this doesn’t get us to the actual answer of what is happening, but it does allow us to see that our web server is calling COM and COM is making a call across a socket to another server and we are waiting on that.  So now what is left is to track down what this IP address maps to and see why the calls around returning right away.


I may update this post with the rest if people are interested.  But the important part is being able to narrow down where the problem is.  Then you can use the right tools and troubleshoot the correct area.


kick it on DotNetKicks.com

Comments (7)

  1. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  2. Narrowing down a problem is sometimes the hardest part of troubleshooting.  Learn how DebugDiag

  3. We got some really good answers to this quiz.  I am going to give my answer but it isn’t the only

  4. Joshi says:

    Hey Tom, I wanted to know what is the impact on the server when we capture dumps using Debugdiag, Especially when we enable leak track

  5. Debugdiag doesn’t have much impact other then the CPU and hard drive space needed to create the dump.  Enabling Leak Track is a bit more.  As it will track any allocations that happen, but it will only be in that process.  The impact on the server should be minimal.

  6. Roberto Santoro says:

    Hey Tom,

    I’m sorry but I don’t understand how do you find out which is the thread which is executing the long running request (basically how do you go from step 2 to step 3.

    Thanks,

    Roberto.

  7. Roberto,

    You get the client connections list in the debug diag report and you just look for one that has been running for a long time.  Like that request had been running for 2 minutes and 46 seconds.