.NET Hang Debugging Walkthrough


 


I have talked about a number of different hang/performance issues in my posts.  This post is a generic debugging walkthrough for hangs.  It applies to all types of .NET processes and to some extent also to non-.net processes even though it has some .net specific items.

Define the hang/performance issue

The first step in troubleshooting a performance issue/hang of any kind, whether it is .NET or native is to find out what we are dealing with and what the expected resolution is.

Asking yourself a few standard questions may help you zone in on the problem without even attaching a debugger, and if you need to attach a debugger you will have a lot better understanding of what data you should collect and what you should be looking for.

If you call into support with a hang situation it is also very good to have the answers to these questions handy.

My Standard questions are (in no particular order):


  • What is the user experience? For example. When they browse to your asp.net application do they get timeouts? Do the calls ever return? If they return, do they always seem to return after around 45 seconds? — This helps you get a clue about if you have a dead-lock or if you are waiting for a resource that is timing out etc.
  • What is the CPU usage? Low CPU problems usually indicate that we are waiting for a lock or an external resource (web service, database etc), while a high CPU usage indicates that we have an infinite loop, high CPU in garbage collection or that the server simply can’t handle the load.
    If it is a high CPU issue, it is also interesting to note if we are constantly at 100% CPU in the affected process, or if we are running high but not necessarily 100%. And also, if any other processes on the box run at high CPU.
  • Can you reproduce the problem? If you can reproduce the problem reliably, make an effort to try to reproduce it on a test server so you don’t have to put any additional pressure on the production server. In addition, if you can repro you are half-way home because we have a much smaller subset of the application to work with.
  • Is there a temporary resolution? For example. Does the problem go away if you do an IISReset? This may give an indication of where the problem lies.
  • What external processes are involved? Does the application call web services, do database access, call into a remoting server , access a main frame computer etc. And if so, what do we know about these processes? Do they seem to be blocking, have they stopped receiving requests etc.
  • What modules are affected? For example in an ASP.NET application, do all pages hang or only a few (if so, what is special about the few?). Do html/asp pages work? etc.
  • Do we have any interesting eventlog entries during the hang? Anything can be interesting, network entries, eventlog entries from other applications etc. There might be a clue out there and our job is to play Sherlock.
  • What else do we know about the hang? Things that are typically interesting is “It only happens during load”, “Seems to happen more frequently on Mondays around 9:45 am”, “When the process hangs memory also seems to increase rapidly”.

If you have any type of information that seems even remotely related to the problem you should write it down in your log book.

Finally, if you are dealing with a performance issue rather than a full blown hang it might be worthwhile to take a few moments and think about what you consider acceptable performance to be, so you don’t spend hours and hours on optimizing after you have reached acceptable levels.

Gathering dumps:

Once you have defined the problem a bit you can start collecting dumps and needless to say, it is very important that you get the dumps while the problem is occurring.

To take memory dumps of a hanging process you can use adplus that comes with the debugging tools for windows by running


adplus -hang -pn <process name.exe>


If it is possible, it is good practice to take two consecutive dumps during the same hang so that you can compare the dumps and determine if the process is moving forward at all, but one is enough in most cases.

During the time the dump is taken the process will momentarily stall. You can usually count 1 second per 10MB data that needs to be written, and with a full memory dump the size of the dump will typically be the same as the private bytes shown in perfmon for that process.

If you do take multiple dumps, make sure that the first one is completely written before you start the second dump, and preferably let it go10-20 seconds after the first dump is completely written before you start the second dump.

Debugging:

Now we can start with the fun part of the process, yeah you guessed it right, debugging:)


  1. Open the memory dump in windbg File/Open Crash Dump

  2. Set up your symbol path

    .symfix c:\mysymcache
    .reload

    To learn more about why symbols are important read this: http://blogs.msdn.com/tess/archive/2005/12/05/500068.aspx

  3. Load sos.dll

    .load clr10\sos (for 1.1) or
    .load <framework directory>\sos.dll (for 2.0)

  4. Check out all the .NET stacks and the native stacks

    ~* kb 2000 native stacks
    ~* e !clrstack .net stacks

If you are debugging an ASP.NET app you can read this post to see which threads you can ignore right off the bat.

So far so good… now comes the tricky part, understanding what it is you are seeing…

Low CPU hangs

A low CPU hang is typically caused by one of these:


  • Lock
  • WaitOne
  • Critical Section
  • Waiting for external resources

Lock


A lock is a way to synchronize access to a resource and the code will look something like this

lock(_myLock){
    //do some work
}

 


A thread that is waiting for a lock like the one in the sample above looks like this:

0:003> kb 200
ChildEBP RetAddr Args to Child
0334f9ec 77f7f49f 77e74bd8 00000001 0334fa38 SharedUserData!SystemCallStub+0x4
0334f9f0 77e74bd8 00000001 0334fa38 00000000 ntdll!NtWaitForMultipleObjects+0xc
0334fa8c 79281971 00000001 00179f88 00000001 KERNEL32!WaitForMultipleObjectsEx+0x12c
0334fabc 79282444 00000001 00179f88 00000001 mscorwks!Thread::DoAppropriateWaitWorker+0xc1
0334fb10 7929a8ed 00000001 00179f88 00000001 mscorwks!Thread::DoAppropriateWait+0x46
0334fb94 7924884b 00184d88 ffffffff 00000000 mscorwks!AwareLock::EnterEpilog+0x9d
0334fbb0 792d2d1e 00cfd47c 00d109bc 00d109a4 mscorwks!AwareLock::Enter+0x69

If you have any stacks that are waiting in this state you should take a look at this blog post to see how to troubleshoot them further

WaitOne

WaitOne is a function called on a ManualResetEvent or an AutoResetEvent. It is used to wait for the return from an async call. For example when you call a web service, under the covers it will make an async call to the web service and the code that called the web service will sit in a waitone waiting for the web service call to return so that you can process the results.

The call stack for a thread waiting in a waitone looks like this:

61 Id: 1164.a44 Suspend: 1 Teb: 7ff3a000 Unfrozen
ChildEBP RetAddr Args to Child
0d58f430 7c822114 77e6711b 00000001 0d58f480 ntdll!KiFastSystemCallRet
0d58f434 77e6711b 00000001 0d58f480 00000000 ntdll!NtWaitForMultipleObjects+0xc
0d58f4dc 791e0b3b 00000001 0d58f5bc 00000001 kernel32!WaitForMultipleObjectsEx+0x11a
0d58f50c 791e0bdd 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWaitWorker+0xc1
0d58f560 791d5ba9 00000001 0d58f5bc 00000001 mscorsvr!Thread::DoAppropriateWait+0x46
0d58f5a8 024840c8 0d58f5b4 00000000 000186a0 mscorsvr!WaitHandleNative::CorWaitOneNative+0x6f

If you have any threads waiting in this type of call, check out this blog post

Critical Section

A Critical Section is the native equivalent of a lock, and the call stack looks something like this:

0:021> kb
ChildEBP RetAddr  Args to Child             
0210e7cc 7c822124 7c83970f 000009f8 00000000 ntdll!KiFastSystemCallRet
0210e7d0 7c83970f 000009f8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc
0210e80c 7c839620 00000000 00000004 00000001 ntdll!RtlpWaitOnCriticalSection+0x19c
0210e82c 4bfa4c16 027c0714 0277a5b4 4bf727af ntdll!RtlEnterCriticalSection+0xa8

To troubleshoot these, look at this blog post

External Resource

If neither of the above reasons for a low CPU hang applies, look for threads that are making DB calls or calls to other external resources.

High CPU Hangs

A high CPU hang is typically caused by one of these:


  • Infinite loop
  • High CPU in GC
  • Busy Server

High CPU hangs are a little bit harder to troubleshoot since it is often hard to figure out if it is a single thread that is using up all the CPU or if it is a lot of small actions that make up for the high CPU.

If the process is steadily at 100% it is typically an infinite loop.

To determine what threads are using up most of the CPU you can run .time and !runaway to get an idea of how long your process has been up, how much time it has spent on the CPU (user mode time) and how much user mode time each thread has been using to find out what threads are using the most.

For example in this case we have spent a total of 14 seconds in user mode, and out of that, thread 14 has used 4 seconds.

0:000> .time
Debug session time: Mon Oct 16 12:20:22.000 2006 (GMT+2)
System Uptime: 3 days 17:11:24.778
Process Uptime: 0 days 0:09:07.000
Kernel time: 0 days 0:00:26.000
User time: 0 days 0:00:14.000

0:000> !runaway
User Mode Time
Thread Time
14:554 0 days 0:00:04.656
25:2ec 0 days 0:00:02.906
27:1264 0 days 0:00:02.359
17:13b4 0 days 0:00:01.625
16:12dc 0 days 0:00:01.375


Check out !runaway in the windbg helpfile and play around with some of its options.

Infinite loop

If we are in an infinite loop, the thread with the loop will normally show up on top of !runaway. You can also identify it by running ~* e !clrstack and ~* kb to see if any of the threads seem to be in a looping construct (if you know the code).

High CPU in GC

If the GC threads show up as the top threads in !runaway. Take a look at the perfmon counter .NET CLR Memory/% Time in GC to see if the average is very high. If it is follow this post to determine why.

Busy Server

If all the threads are just spread out, busy with different things you may have some optimization work ahead of you. There is no easy way to debug this type of thing to determine where to optimize. Instead you should pull out some of the most commonly used features and run profilers on them to see how much time / CPU time they take and optimize the pieces that take the longest time.

Phew, That was a long post:) But hopefully it has given you some insight in how to debug hangs and performance issues.








Comments (35)

  1. Lakshmi says:

    Can you point me to the download link for adplus. I tried to look for it in the debugging tools link that you provided but couldnt find it.

  2. Tess says:

    When you install the debugging tools for windows, adplus.vbs is located in the c:program filesdebugging tools for windows directory, or the directory you specify during the installation if you use custom installation.

  3. Srikar says:

    Hey Tess,

    Nice article, but sometimes hangs can be in a different form too. For example, I have a ASP.NET web application which logs out all the users all of a sudden, no spikes in CPU usage, no entries in eventlogs, cannot be replicated in QA. Can this kind of scenario be called a "hang" and how do we use the debugging tools in this scenario. The mass logouts happen to all the users at the same time.

    Appreciate if you can commment on this scenario.

    Thanks

  4. Srikar says:

    How come I cannot see my post that I posted yesterday?

  5. Tess says:

    Srikar,

    If you have a web app that logs all users out it is probably a crash or a recycle of the worker process.  Alternatively a recycle of the app domain.

    The reason you couldn’t see the comments is because I have to moderate comments because of some sites sending out spam comments:(  I am sorry about this, and I normally publish the comments right away when I see them but yesterday I was delivering a workshop so I didn’t log into email.

  6. Srikar says:

    Tess,

    Thanks for you comments and time.  So does IIS not log anything in event/system/security logs during app domain recycles/worker process? as we do not see anything anything in the eventviewer at the time of the logouts. Also if it is a crash, how do we debug this scenario and when is the right time to take a dump?

    Additionally we are also investigating if it is because of blocking of some connections in the connection pool. Either case is there a clear and definite way to get answers from a dump.

    It is a very interesting problem..

    Thanks,

    Srikar

  7. Srikar says:

    Tess,

    I read all your articles so far..and yes I did read that article. We are logging all .NET counters related to performance/threads/locks in the db table. We do not see any app restarts/ worker process recycles in our tables. Also when mass logouts are happening, we did observe that they are happening for a particular spid..which is like all the users on the same connection in the connection pool..It is indeed a weird problem/…

    Srikar

  8. nativecpp says:

    Hi Srikar,

    I would suggest that you do a crash dump and look at the callstack. It may give you some hints. That is what I usually do when I don’t know what happened.  The disadvantage is that it would make the app run slower.

  9. Olegas says:

    Tess,

    Great post! Please keep ‘em coming.

    By the way, there is a discrepancy in .time output between dump file and AdPlus log:

    http://groups.google.com/group/microsoft.public.windbg/browse_thread/thread/9067d5fc1153b1fb/#

    Thank you,

    Olegas

  10. Tess says:

    When you create a dump you start off with creating a log file (if this is in crash mode this log file will be started when you first attach, and if it is hang mode when you start dumping), at this time .time is run and the time is recorded in the log file.

    From there to when the dump is actually gathered some time will elapse, and this is the time diff betwee the log file and the dump.

    For crash mode this can be several hours or days (depending on how long you are attached before you crash), and in hang mode it is about 1 second per 10 MB of process memory.

  11. Olegas says:

    Tess,

    Sorry, I should have clarified it further. Actually, values between kernel time and user time are swapped. The output in AdPlus log is correct, but output from within Windbg while reading the same dump file is incorrect.

    For example:

    AdPlus log:

     Kernel time: 0 days 0:16:43.531

     User time: 0 days 0:00:28.531

    WinDbg:

     Kernel time: 0 days 0:00:28.000

     User time: 0 days 0:16:43.000

    In this example, application really spent almost 17 minutes in privileged mode. Until a fix becomes available, I rely on .time output from AdPlus log.

    Thank you,

    Olegas

  12. Tess says:

    Hi Olegas,

    Interesting… I have noticed the swapped times before too (in some windbg versions) and never knew why it was.  (I just assumed a problem in windbg) but it is interesting that they are different in adplus and windbg because they both run the .time command.

    At first when I saw your comment I thought it may be different in cdb.exe (which adplus uses) and windbg, but that doesn’t make sense since they are essentially the same, and debugging a dump with cdb i can see that there is no difference, so either there is a difference live (which i dont think) or you got the dumps from someone with a different version of windbg/cdb where the times were swapped.

    I will send off a note on this to the guys in the debugger team so they are aware.  It has been this while for a few builds but I think noone has bothered mentioning it and just swapped the times in their heads.

  13. chen says:

    Hi Tess,

    Periodically we see the worker process (w3wp.exe) get into a state that

    suggests that it is hanging (ASP.NET 1.1 web service).

    Often we end up recycling the appropriate app pools to get around the

    problem. But the frequency at which this is happening is beginning to

    concern…

    So when this happened again, we got a hang dump using the adplus.vbs &

    started looking at the dump in WinDbg. While i’m not an expert with

    WinDbg by any means, the following stack trace in one of the thread

    caught my attention:

     10  Id: 14f4.1580 Suspend: 1 Teb: 7ffaa000 Unfrozen

    ChildEBP RetAddr  Args to Child

    01b0e2e8 7c821b74 77e99af9 d0000144 00000004 ntdll!KiFastSystemCallRet

    01b0e2ec 77e99af9 d0000144 00000004 00000000 ntdll!NtRaiseHardError+0xc

    01b0e984 01001623 01b0f078 01b0f078 010015e6

    kernel32!UnhandledExceptionFilter+0x54b

    01b0e9a0 7815e3b2 01b0f078 01b0f078 01b0f078

    w3wp!WpUnhandledExceptionFilter+0x3d

    01b0e9b0 7c35f0c3 01b0f078 7c35f07d 79030a86

    msvcr80!__CxxUnhandledExceptionFilter+0x63

    [f:rtmvctoolscrt_bldself_x86crtprebuildehunhandld.cpp @ 86]

    01b0e9bc 79030a86 01b0f078 79030a23 01b0ed20

    msvcr71!__CxxUnhandledExceptionFilter+0x46

    WARNING: Stack unwind information not available. Following frames may

    be wrong.

    01b0f050 77e84259 01b0f078 77e6b831 01b0f080

    mscoree!EEDllGetClassObjectFromClass+0x11d38

    01b0f058 77e6b831 01b0f080 00000000 01b0f080

    kernel32!BaseThreadStart+0x4a

    01b0f080 7c82eeb2 01b0f164 01b0ffdc 01b0f180

    kernel32!_except_handler3+0x61

    01b0f0a4 7c82ee84 01b0f164 01b0ffdc 01b0f180 ntdll!ExecuteHandler2+0x26

    01b0f14c 7c82ecc6 01b09000 01b0f180 01b0f164 ntdll!ExecuteHandler+0x24

    01b0f14c 79209a09 01b09000 01b0f180 01b0f164

    ntdll!KiUserExceptionDispatcher+0xe

    01b0f460 7922b33d 00000800 00000000 00000000 mscorsvr!SetupThread+0x21

    01b0f490 7929b5e6 00000000 793f4438 793f4438

    mscorsvr!CoInitializeEE+0x81

    01b0f4a4 7929f6ac 00000800 77e6bbc2 00000000

    mscorsvr!SetBufferOverrunHandler+0x24

    01b0f4cc 7929f7d2 01b0f4d8 00000000 01b0f518

    mscorsvr!CCompRC::GetLibrary+0x26

    01b0f4dc 7929c0e2 0000171a 000d72b8 00000400

    mscorsvr!CCompRC::LoadString+0x11

    01b0f518 7925864b 00000000 0000171b 0000171a

    mscorsvr!CorMessageBox+0x51

    01b0f95c 7922d668 79e69af0 00000000 00000000 mscorsvr!EEDllMain+0x1f1

    01b0f984 7922daa5 00000000 79e69af0 00000000 mscorsvr!EEStartup+0x1ac

    01b0f9b4 7922db1b 00000000 79e69af0 0000fde9 mscorsvr!TryEEStartup+0x21

    01b0f9e8 792292df 74737953 572e6d65 482e6265 mscorsvr!COMStartup+0x55

    01b0fb60 79002aa7 79e69af0 79e699fc 79e9acc8

    mscorsvr!ClrCreateManagedInstance+0xdf

    01b0fb78 79e8ee37 79e69af0 79e699fc 79e9acc8

    mscoree!ClrCreateManagedInstance+0x44

    01b0fb94 79e8f081 01942d10 00000104 00000000

    aspnet_isapi!CreateAppDomainFactory+0x31

    01b0fc04 79e762e4 01b0fd48 01b0fc44 01b0fc3c

    aspnet_isapi!GetAppDomain+0x118

    01b0fe50 79e7619c 01942d10 79e8e231 79e8e244

    aspnet_isapi!HttpCompletion::ProcessRequestInManagedCode+0x139

    01b0fe5c 79e8e244 01942d10 00000000 00000000

    aspnet_isapi!HttpCompletion::ProcessCompletion+0x24

    01b0fe70 791d4096 01942d10 000baa20 793ecec8

    aspnet_isapi!CorThreadPoolWorkitemCallback+0x13

    01b0fe84 791d4055 000baa20 00000000 791d3fe5

    mscorsvr!ThreadpoolMgr::ExecuteWorkRequest+0x19

    01b0fea4 791fdacc 00000000 00000000 80a78bcc

    mscorsvr!ThreadpoolMgr::WorkerThreadStart+0x129

    01b0ffb8 77e6608b 000c7ef0 00000000 00000000

    mscorsvr!ThreadpoolMgr::intermediateThreadProc+0x44

    01b0ffec 00000000 791fda8b 000c7ef0 00000000

    kernel32!BaseThreadStart+0x34

    It looks like while trying to create the AppDomain there was an unhandled exception. Can you shed more light on this? Is there something else should i be looking at?

    [CPU, memory utilization etc were not not really significant.]

    Btw, the server is running Windows Server 2003 (SP1) on a dual proc (with HyperThreading enabled) machine.

  14. Tess says:

    Hi Chen,

    You are absolutely right that this is probably what is blocking your process.

    If you look at

    01b0e984 01001623 01b0f078 01b0f078 010015e6 kernel32!UnhandledExceptionFilter+0x54b

    The first parameter (01b0f078) is a pointer to exception records so you can do

    dd 01b0f078 01b0f078+4 to get addresses for the exception filters

    eg. something like

    0:000> dd 01b0f078 01b0f078+4

    01b0f078 00032d18 00032d34

    and then you can do

    .exr on the first DWORD you get, eg.

    .exr 00032d18

    in this case, to get the exception

    You can also do

    .cxr 01b0f078 to set the context to where the exception occurred and if you do kb at that point you will get the stack at the exception

    I would also suggest that you dc the first argument to ntdll!NtRaiseHardError+0xc

    01b0e2ec 77e99af9 d0000144 00000004 00000000 ntdll!NtRaiseHardError+0xc

    dc d0000144

    To see if it is something readable

    Tess

  15. chen says:

    Hi Tess,

    Thanks for your feedback.

    For the benefit of others…

    We submitted a bug report with Microsoft & got a hotfix as reported in KB911205 [http://support.microsoft.com/kb/911205/en-us]. And we’re still in the process of testing this hotfix.

  16. Test123 says:

    After opening windebugger

    1. I have set symbol path to srv*c:myappsymbols*http://msdl.microsoft.com/download/symbols

    2. Typed .loadby sos mscorwks

    3. It said it could not load file from C:STHMicrosoft.NETFrameworkv2.0.50727sos

    4. So I have loaded it from my machine by typing the following

    .load C:WindowsMicrosoft.NETFrameworkv2.0.50727sos.dll

    5. Now when I type !help I see following

    SOS is a debugger extension DLL designed to aid in the debugging of managed

    programs. Functions are listed by category, then roughly in order of

    importance. Shortcut names for popular functions are listed in parenthesis.

    Type "!help <functionname>" for detailed info on that function.

    Object Inspection                  Examining code and stacks

    —————————–      —————————–

    DumpObj (do)                       Threads

    DumpArray (da)                     CLRStack

    DumpStackObjects (dso)             IP2MD

    DumpHeap                           U

    DumpVC                             DumpStack

    GCRoot                             EEStack

    ObjSize                            GCInfo

    FinalizeQueue                      EHInfo

    PrintException (pe)                COMState

    TraverseHeap                       BPMD

    Examining CLR data structures      Diagnostic Utilities

    —————————–      —————————–

    DumpDomain                         VerifyHeap

    EEHeap                             DumpLog

    Name2EE                            FindAppDomain

    SyncBlk                            SaveModule

    DumpMT                             GCHandles

    DumpClass                          GCHandleLeaks

    DumpMD                             VMMap

    Token2EE                           VMStat

    EEVersion                          ProcInfo

    DumpModule                         StopOnException (soe)

    ThreadPool                         MinidumpMode

    DumpAssembly                      

    DumpMethodSig                      Other

    DumpRuntimeTypes                   —————————–

    DumpSig                            FAQ

    RCWCleanupList

    DumpIL

    6. When I type !finddebugmodules it says No export found for finddebugmodules.

    Does this mean I am have loaded an older version of SOS dll? As I have explained in step4 I have loaded .net 2.0 sos dll

    Can anyone please help me in getting this command to work?

  17. ml says:

    good stuff. Can I use "Debug Diagnostic Tools 1.0" instead of ADPlus to gather dump? Are the two tools same?

    Thanks

    ML

  18. Tess says:

    Both tools are built upon the debugging apis and create the same type of dumps.  I would recommend though that you upgrade to debug diagnostics 1.1 if you are going to use debug diagnostics.

  19. Tess says:

    Test123,

    .loadby doesnt work in your case because you have taken a dump from a system that had the framework directory

    C:STHMicrosoft.NETFrameworkv2.0.50727sos

    While the framework directory on your machine is C:WindowsMicrosoft.NETFrameworkv2.0.50727sos and .loadby will use the path of the mscorwks file in this case as it was in the dump.

    The !finddebugmodules only exists in some versions of sos.dll,  (the 1.1 versions so far) so that is why you are not seeing it.   New versions of the 2.0 sos are shipped with new service packs etc.   The reason all of them are not there from the start is because sos 2.0 has a completely different architecture so only the most important ones were there from the get go.  

  20. Nick Batell says:

    Great post. Here is another good article about debugging application hangs: http://www.correlsense.com/cto-blog/top-3-reasons-an-application-hangs/

  21. This is the first in a series of about 10 labs on .NET debugging. The lab will use a site called BuggyBits,

  22. Chris says:

    ok so I’ve got the dump file open in windbg, and I see this:

    0:000> .time

    Debug session time: Wed Apr 23 11:37:58.000 2008 (GMT-4)

    System Uptime: 14 days 8:25:37.720

    Process Uptime: 0 days 0:06:36.000

     Kernel time: 0 days 0:09:28.000

     User time: 0 days 0:00:51.000

    0:000> !runaway

    User Mode Time

     Thread       Time

     26:edc       0 days 0:04:33.218

     14:16f0      0 days 0:04:31.921

     25:c60       0 days 0:00:09.828

     15:c68       0 days 0:00:07.843

     27:16bc      0 days 0:00:01.500

     20:d38       0 days 0:00:00.750

    So I see the top two are running a long time, which fits with PerfMon showing two executing requests appearing stuck:

     26  Id: 1240.edc Suspend: 0 Teb: 7ff3e000 Unfrozen

    ChildEBP RetAddr  Args to Child              

    1aefeda4 79e734c4 bcb0100b 7a3ad7a0 1aefee38 mscorwks!_EH_epilog3+0xa

    1aefedd0 79e7422a 00000010 00000001 1ab6a870 mscorwks!ClrFlsSetValue+0x57

    1aefedf0 79f04e1a bcb013fb 1aefef4c 00004000 mscorwks!CantAllocThreads::MarkThread+0x6f

    1aefee20 79f0815e 1aefee38 00004000 1ab6a870 mscorwks!CLRVectoredExceptionHandler+0x34

    1aefee4c 79f080a7 1aeff22c 1aeff644 1aefef30 mscorwks!CPFH_RealFirstPassHandler+0x607

    1aefee70 7c828752 1aeff22c 1aeff644 1aefef4c mscorwks!COMPlusFrameHandler+0x15a

    1aefee94 7c828723 1aeff22c 1aeff644 1aefef4c ntdll!ExecuteHandler2+0x26

    1aefef3c 7c82863c 1aef9000 1aefef4c 00010007 ntdll!ExecuteHandler+0x24

    1aeff21c 77e4bee7 1aeff22c 1aeff2b4 e0434f4d ntdll!RtlRaiseException+0x3d

    1aeff27c 79f071ac e0434f4d 00000001 00000001 kernel32!RaiseException+0x53

    1aeff2dc 79f9293a 0e9aad8c 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x2a8

    1aeff314 7a02dc0a 1aeff378 0013f418 bcb00e13 mscorwks!UnwindAndContinueRethrowHelperAfterCatch+0x70

    1aeff3c8 79eb8013 1c9c5e0a 00000013 00000000 mscorwks!VirtualCallStubManager::ResolveWorkerStatic+0x8b

    1aeff4b0 65fe07f1 0a978b6c 0a978b6c 0a979998 mscorwks!ResolveWorkerAsmStub+0x33

    1aeff4d0 65fe07f1 0e6872f4 0a9790d4 0a978b58 System_Web_ni+0xc07f1

    1aeff4e0 79e75d61 7936edea 00000000 02b0fbd8 System_Web_ni+0xc07f1

    1aeff4e4 7936edea 00000000 02b0fbd8 02ae6b10 mscorwks!JIT_Stelem_Ref+0x25

    1aeff504 65fe41ed 0667364c 0a978b6c 1aeff53c mscorlib_ni+0x2aedea

    1aeff564 65fccbfe 1aeff5b4 02ade058 0a978b6c System_Web_ni+0xc41ed

    1aeff578 65fd19c5 02ae6b10 0667364c 029fb120 System_Web_ni+0xacbfe

    1aeff59c 7938111c 70159d3c 01bf2e3c 02ade058 System_Web_ni+0xb19c5

    00000000 00000000 00000000 00000000 00000000 mscorlib_ni+0x2c111c

     14  Id: 1240.16f0 Suspend: 0 Teb: 7ffaa000 Unfrozen

    ChildEBP RetAddr  Args to Child              

    01cfe8c8 79e84896 00000000 00000000 79f07957 mscorwks!ExInfoWalker::WalkToPosition+0x1e

    01cfeaa8 79e84bf2 01cfead0 79f07957 01cfee90 mscorwks!Thread::StackWalkFramesEx+0x13b

    01cfedd8 79f07449 79f07957 01cfee90 00000000 mscorwks!Thread::StackWalkFrames+0xb8

    01cfedf4 79f08585 01cfeeec 00153960 01cfee90 mscorwks!LookForHandler+0x26

    01cfef0c 79f081d6 01cff32c 01cff744 01cff04c mscorwks!CPFH_RealFirstPassHandler+0x49f

    01cfef4c 79f080a7 01cff32c 01cff744 01cff030 mscorwks!CPFH_RealFirstPassHandler+0x68c

    01cfef70 7c828752 01cff32c 01cff744 01cff04c mscorwks!COMPlusFrameHandler+0x15a

    01cfef94 7c828723 01cff32c 01cff744 01cff04c ntdll!ExecuteHandler2+0x26

    01cff03c 7c82863c 01cf9000 01cff04c 00010007 ntdll!ExecuteHandler+0x24

    01cff31c 77e4bee7 01cff32c 01cff3b4 e0434f4d ntdll!RtlRaiseException+0x3d

    01cff37c 79f071ac e0434f4d 00000001 00000001 kernel32!RaiseException+0x53

    01cff3dc 79f9293a 06b5d6cc 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x2a8

    01cff414 7a02dc0a 01cff478 0013f418 a7900913 mscorwks!UnwindAndContinueRethrowHelperAfterCatch+0x70

    01cff4c8 79eb8013 1c9c5e0a 00000013 00000000 mscorwks!VirtualCallStubManager::ResolveWorkerStatic+0x8b

    *** WARNING: Unable to verify checksum for System.Web.ni.dll

    01cff5b0 65fe07f1 0a964e34 0a964e34 0e7cef90 mscorwks!ResolveWorkerAsmStub+0x33

    01cff5d0 65fe07f1 0e6872f4 0a71cf48 0a964e34 System_Web_ni+0xc07f1

    01cff5e0 79e75d61 7936edea 00000000 0ac05cb8 System_Web_ni+0xc07f1

    *** WARNING: Unable to verify checksum for mscorlib.ni.dll

    01cff5e4 7936edea 00000000 0ac05cb8 06c06c20 mscorwks!JIT_Stelem_Ref+0x25

    01cff604 65fe41ed 0667364c 0a964e34 01cff63c mscorlib_ni+0x2aedea

    01cff664 65fccbfe 01cff6b4 06c06680 0a964e34 System_Web_ni+0xc41ed

    01cff678 65fd19c5 06c06c20 0667364c 0698b84c System_Web_ni+0xacbfe

    01cff69c 7938111c 70ba1f7e 01bf2e3c 06c06680 System_Web_ni+0xb19c5

    00000000 00000000 00000000 00000000 00000000 mscorlib_ni+0x2c111c

    Now what?  How does this tell me where in my code to look for a problem and what am I looking for?

    Please copy reply to:

    gadget

    at

    ccrtc

    .c o m

  23. Tess says:

    Hi Chris,

    Based on just this small sample of data I can say that it is throwing an exception, specifically a .net exception since the 3nd param to RaiseException is e0434f4d (CLR Exception)

    If this is something that happens very frequently it could cause high CPU because of the constant stack unwinding but I would have to see the dump and perfmon logs etc. to tell for sure.

    I suggest that you look through some of my exception articles like http://blogs.msdn.com/tess/archive/2005/11/30/498297.aspx

    You probably want to check out the .net CLR Exceptions counters to see how many exceptions you’re throwing and then follow some of my posts (see post index) to track down why the exceptions occurr…

    Hope that helps,

    Tess

  24. I started this blog 2.5 years ago today, mostly because I felt that the same types of issues came up

  25. Tony says:

    Hi Tess

    Thanks for the article. I have a very peculiar ‘Thread Abort’ issue on which we are not getting anywhere . We have this Vb.NET windows application that triggers a C# executable.

    The C# executable writes to the Database and is very CPU intensive in performance (internally the C# executable call some third party dlls build on C++).

    The executable runs for anywhere between 3 to 9 mins and the Vb.NET windows applications makes a small call every ‘n’ seconds (set to 20) to the database to check on the progress made by the above executable.

    The pain point is that at times there are frequent ‘Thread Abort’s’ noticed on the event log causing the worker process to become non responsive and hang. Unless it is recycled it does not start. And to add to that this occurs on Windows 2000 servers and not the Windows Xp boxes (prob due to better memory management) and not on our development environment but on all these production boxes where we are not allowed to take a dump and install all these tools for isolating the problem.

    In your experience, is the above situation prone to thread aborts? Can i remedy it by avoiding thread aborts in anyway. (Additionally some boxes it seems to recover from thread aborts too !!and there are no diff in settings in the IIS 5.0 on any of the boxes) .

    Thanks in advance , any guidance will make our day.

    Tony

  26. Debugging after a power outage

  27. JanIvan Qian says:

    It’s a really really great blog in .NET debugging. Definitely, I love this blog much and much!

    I’m new in ASP.NET application debugging, and I’ve learned too much from you. Thank you Tess, and Merry Christmas!

  28. Tess says:

    merry x-mas to you too:)

  29. Steve Brown says:

    Tess, great blogg, I have learnt so much, take a look at these !runaway results, would do you make of them:

    0:000> !runaway

    User Mode Time

     Thread       Time

     25:cd0       0 days 3:29:28.093

     10:d48       0 days 2:35:30.109

     29:1014      0 days 2:26:05.265

     26:334       0 days 1:33:31.906

     21:a54       0 days 1:27:28.906

     20:14e4      0 days 0:08:33.265

     13:fe8       0 days 0:06:55.796

     12:118c      0 days 0:05:40.515

     15:a28       0 days 0:05:23.875

     14:113c      0 days 0:05:17.796

     33:1344      0 days 0:01:31.093

     37:16b0      0 days 0:01:29.125

     34:384       0 days 0:01:28.750

     36:4e8       0 days 0:01:27.062

     35:874       0 days 0:01:21.281

     38:f9c       0 days 0:01:20.343

     40:15a8      0 days 0:01:02.734

     43:958       0 days 0:01:01.343

     42:122c      0 days 0:00:57.843

     45:f90       0 days 0:00:53.125

     44:c9c       0 days 0:00:52.984

     46:1048      0 days 0:00:31.468

    After looking at !clrstack for thread 25:

    0:025> !clrstack

    OS Thread Id: 0xcd0 (25)

    ESP       EIP    

    00d0e614 7c8285ec [HijackFrame: 00d0e614]

    00d0e654 7940e095 System.Globalization.CompareInfo.IndexOf(System.String, System.String)

    00d0e660 793a6261 System.String.IndexOf(System.String)

    00d0e66c 1cae730e ePosV2_Web.PrintingControl.set_dealer(ePosCore.Business.Dealer.Dealer)

    00d0e6f8 1b934f3e ePosV2_Web.BrandingControl.buttonPressed(toolbarButton)

    00d0e89c 1cae71aa ePosV2_Web.BrandingControl.imgPrint_Click(System.Object, System.EventArgs)

    00d0e8ac 6881bdc6 System.Web.UI.WebControls.Button.OnClick(System.EventArgs)

    00d0e8c0 6881bfbc System.Web.UI.WebControls.Button.RaisePostBackEvent(System.String)

    I know this contains a loop, would this be an infinite loop in my app as the performance is terrible?

  30. Tess says:

    Hi Steve,

    IndexOf works a bit like a regular expression so if you have a very long string you may get a lot of backtracks and very high CPU on the thread.    The loop is not infinite but for very large strings it may take several minutes at 100% CPU so if possible you should do what you can to minimize the length of the string you’re doing indexof on

  31. Agustin says:

    Hi there Tess. I'm trying to find out the reason for some bad CPU spikes.

    By debugging the dump and getting the crl stack for all threads I identify some processes that may be involved in the problem, they are image processing methods and the spiking "would make sense". Is it dangerous to assume that the problem could be in some of the identifiable methods of the dump taken when the cpu hit 100% ? Or the stack could be somehow unrelated ?

    Also, I wonder what it means to see over 20+ threadpool workers waiting for recycling when I call !threads in the troublesome dump

    E.g: XXXX   30       1be4d390   1019820 Enabled  00000000:00000000 01fd0cc8     0 Ukn (Threadpool Worker)

    Thanks for your help :: )

    Cordially, Agustin.

  32. geardoom3 says:

    I have a C# app in .NET 4 running as a Console app started by SQL agent every minute.

    It works like a charm until of a suddent, it hangs in the Task Manager with CPU to 0%

    I tried the adplus but all it does is generating a Dump file in which there's nothing informative in there.

    By the way all the application does is reading SQL, if it finds something to work with, it starts the SAP transaction then it prints a document and it updates the table to say ok I did it and I'm finished

    This job is perform a lot of time and works like a charm.  I can tell you that this process is done a lot.  Now why 2 or 3 times a day, it hangs there which makes the SQL agent not capable of starting it again of course.

    I don't have a lot of idea on how to debug this.

    Any thought ?

Skip to main content