Debugging a .NET crash with rules in Debug Diag

During mine and Micke’s presentation at TechDays this week I showed a demo of setting up rules with Debug Diag to identify the cause of a crash in an ASP.NET application.

Even though debugging might be tricky, setting up rules in Debug Diag is beautifully simple and I personally believe that it would be a good idea for anyone running a web site to have debug diag installed along with a few instructions for the ops personnel on how to set up the rules.   Better yet,  you can set up the rules in advance and just activate or deactivate them as needed.

Here is a recap of that demo…

Problem description

The application crashes with the following event in the eventlog

Log Name:      Application
Source:           Application Error
Date:              2009-03-20 11:12:09
Event ID:        1000
Task Category: (100)
Level:              Error
Keywords:       Classic
User:               N/A
Computer:       MYMACHINE
Faulting application w3wp.exe, version 7.0.6001.18000, time stamp 0x47919413, faulting module kernel32.dll, version 6.0.6001.18000, time stamp 0x4791a76d, exception code 0xe053534f, fault offset 0x000442eb, process id 0x%9, application start time 0x%10.

So the w3wp.exe process crashed due to a 0xe053534f exception (which happens to be stack overflow, but even if you don’t know that it doesn’t really matter)

Setting up Debug Diag rules and gathering data


When you open up Debug Diag you are met by the following screen where you can set up different rules.

If you have a potential memory leak or a memory issue, you should choose the memory and handle leak rule.  This will inject a dll in the process that will track any allocations or de-allocations that occur while the rule is active.  Once you have “leaked” memory you can then get a memory dump with debug diag, and this will now contain info about what stacks etc. allocated the memory that is still in the process.

This works well for native memory issues.  For .net memory issues you should be aware that allocations for .NET GC heaps are made my mscorwks, which means that .net memory issues will show up as mscorwks having a potential leak. I would recommend that you read some of my earlier posts on troubleshooting .net memory issues.

The IIS Hang rule is very useful if you have some pages that sometimes take longer than they should.  With this rule you can set up triggers to get logs or memory dumps if a page takes longer than x seconds.

If you have an exception you want a dump for or a crash you should use the crash rule and that is what we’ll do now…

The next step is to choose a process that the rule should apply to.  Here we can choose either IIS which means that it will apply to all IIS processes like w3wp.exe, dllhost.exe, inetinfo.exe etc.  







In this case we will only set up a rule for w3wp.exe (the process).  If we just leave it like that, and the process crashes it will reattach the rule next time the process comes up.  It will also apply the rule to all w3wp.exe processes if there are multiple on the system.  If you only want it to apply to one specific w3wp.exe instance you have to check the checkbox for “This process instance only”.



The next window that comes up in the wizard lets you configure what you want to trigger the debug diag action.  This can either be just a crash (in which case you just click next).  You can also set it up to get dumps or log events on specific exceptions (as we will do), or on breakpoints.  The PageHeap Flags option is used if you want to troubleshoot heap corruption issues.

Since we want to trigger a dump when the process gets the 0xe053534f exception, we will click on the Exceptions… button here











0xe053534f is a native exception so we can just enter it in the box for the exception code.  We can optionally give the exception a name (this will just label the dumps that are generated). and for the action type we choose full dump to have it dump on the exception.

Note that you can set an action limit here to avoid that it keeps generating dumps.  This is pretty useful if you want to set up a rule for System.NullReferenceException for example, or something else that might happen a lot.

If we would have wanted to dump on a particular .net Exception we would have just choosen the CLR (.NET) Exception from the listbox on the left and we could have then entered the specific exception name (such as System.NullReferenceException)

Now we are ready with the rules and can just click ok, next, next, finish to activate the rule.











Once this is done we can reproduce the issue and debug diag will generate the dump and put it under the <debug diag dir>\logs\crash rule… directory.


Analyzing the data

At this point we can click the Analyze Data button and have debug diag analyze the dump for us.  This works extremely well if the issue is purely native.   Just a few quick notes about this though…

1. I wouldn’t recommend analyzing it on a production server, so the best thing is to copy the dump to a dev machine, and analyze it (from the advanced analysis tab)

2. You should set up the symbol path under tools/options and settings to


3. If the issue is not purely managed, you may still need to open the dump in windbg

If we analyze it we get this info at the top of the html report

Error In w3wp__PID__4000__Date__03_20_2009__Time_10_31_41AM__331__First Chance unknown.dmp the assembly instruction at kernel32!RaiseException+58 in C:\Windows\System32\kernel32.dll from Microsoft Corporation has caused an unknown exception (0xe053534f) on thread 25

This exception originated from mscorwks!ReportStackOverflow+61.
Review the faulting call stack for thread 25 to determine root cause for the exception.

Please follow up with vendor Microsoft Corporation for problem resolution concerning the following file: C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks.dll.

and thread 25 looks like this:

Thread 25 – System ID 4332

Entry point   mscorwks!Thread::intermediateThreadProc
Create time   2009-03-20 10:30:32
Time spent in user mode   0 Days 0:0:0.203
Time spent in kernel mode   0 Days 0:0:0.78


Function     Arg 1     Arg 2     Arg 3   Source
kernel32!RaiseException+58     e053534f     00000000     00000000  
mscorwks!ReportStackOverflow+61     025dc2e0     025dc2e0     044a3db4  
mscorwks!Alloc+3b     00000024     00000000     00080000  
mscorwks!FastAllocateObject+38     00f69594     4a4521fa     044a3e98  
mscorwks!JIT_NewFast+9e     3149ec3b     88cb775e     3149ec3b  
0x02b20794     1d51bcd8     3149ec3b     88cb775e  
0x02b207e8     1d51bcd8     3149ec3b     88cb775e  
0x02b207e8     1d51bcd8     3149ec3b     88cb775e  
0x02b207e8     1d51bcd8     3149ec3b     88cb775e  
0x02b207e8     1d51bcd8     3149ec3b     88cb775e  

This tells us that we seem to be crashing because of a stack overflow, but the stack looks very funky so we don’t really know what caused the stack overflow. 

Mscorwks is listed as the module that caused the crash, but this is just because mscorwks is the component that raises the native exception.

For stack overflows as most of you probably know, the most common reason is that we are in some type of recursive loop, so what we really would like to know here is what is on this stack…  The reason why it is showing up with just addresses and not method names, is because debug diag doesn’t understand .net so we’ll have to bring the dump to windbg to analyze it and check out the .net stack.

In windbg we can then load up sos (.loadby sos mscorwks) and run !clrstack on the active stack to get the callstack

044ce56c 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce5a0 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce5d4 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce608 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce63c 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce670 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce6a4 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce6d8 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce70c 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce740 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce774 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce7a8 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce7dc 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce810 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce844 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce878 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce8ac 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce8e0 02b207e8 GameInfo..ctor(Game, System.DateTime, System.String)
044ce914 02b2071f GameInfo.op_Explicit(Game)
044ce938 02b20531 _Default.Button1_Click(System.Object, System.EventArgs)
044ce984 6def9ec8 System.Web.UI.WebControls.Button.OnClick(System.EventArgs)
044ce99c 6def9d2f System.Web.UI.WebControls.Button.RaisePostBackEvent(System.String)
044ce9b4 6def9f6b System.Web.UI.WebControls.Button.System.Web.UI.IPostBackEventHandler.RaisePostBackEvent(System.String)
044ce9bc 6d7f5d9e System.Web.UI.Page.RaisePostBackEvent(System.Web.UI.IPostBackEventHandler, System.String)

Once you get to this point it is pretty simple, clrstack shows a recursion, as expected, which in this case is because of a logic error in the GameInfo constructor + GameInfo explicit cast operator.

public GameInfo(Game g) : this(g, System.DateTime.Now, “admin”)

public GameInfo(Game g, System.DateTime addDate, string addingUser)
    ID = g.ID;
    Name = g.Name;
    Publisher = g.Publisher;
    AddedDate = addDate;
    AddedBy = addingUser;
    if (g.Prequel != null)
        Prequel = ((GameInfo)g).Prequel;

//explicit cast
public static explicit operator GameInfo(Game g)
    return new GameInfo(g);

This was a pretty long post, but if you have a look at debug diag you’ll find that the user interface is pretty intuitive and again, i would really recommend that you have a look and consider setting up a few rules for operations to activate if the stuff hits the fan.  



Comments

  Walter says:

    Hey Tess,

    You have a deep understanding of IIS!  

    What are your thoughts on upgrading to 64 bit IIS servers?

    Good idea?  Bad idea?  

  What's New says:

    During mine and Micke’s presentation at TechDays this week I showed a demo of setting up rules with Debug

  3. Anthony Sciuto says:

    Could you tell me how to debug this problem please


    Server Error in ‘/’ Application.


    Authentication failed. Restarting authentication process.

    Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

    Exception Details: DotNetOpenMail.SmtpException: Authentication failed. Restarting authentication process.

    Source Error:

    An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.  

    Stack Trace:

    [SmtpException: Authentication failed. Restarting authentication process.


      DotNetOpenMail.SmtpServer.Send(ISendableMessage emailMessage, EmailAddressCollection rcpttocollection, EmailAddress mailfrom) +812

      DotNetOpenMail.EmailMessage.Send(SmtpServer smtpserver) +88

      Radactive.Projects.Arbseek.Core.Utils.SendMail(String fromAddress, String fromName, String toAddress, String toName, String subject, String bodyText, String bodyHTML, String[] attachments) in C:Documents and SettingsmfabbianDocumentiVisual Studio ProjectsRadactive.Projects.ArbseekCoreUtils.cs:310

      Radactive.Projects.Arbseek.WWW.Controls.Contact.sendContactMail(String firstName, String lastName, String email, String message) in C:Documents and SettingsmfabbianDocumentiVisual Studio 2005ProjectsMGSASPNETRadactiveProjectsArbseekWWWControlsContact.ascx.cs:69

      Radactive.Projects.Arbseek.WWW.Controls.Contact.Page_Load(Object sender, EventArgs e) in C:Documents and SettingsmfabbianDocumentiVisual Studio 2005ProjectsMGSASPNETRadactiveProjectsArbseekWWWControlsContact.ascx.cs:82

      System.Web.UI.Control.OnLoad(EventArgs e) +67

      System.Web.UI.Control.LoadRecursive() +35

      System.Web.UI.Control.LoadRecursive() +98

      System.Web.UI.Page.ProcessRequestMain() +739


    Version Information: Microsoft .NET Framework Version:1.1.4322.2300; ASP.NET Version:1.1.4322.2300

    Thank you very much!

    It’s a powerfull tool!

  8. Tess says:

    walter, i dont have a stron opinion on that.  If you need the extra memory then go for it.  

    I woud say though that it shouldnt be used to cover up for a memory issue that you dont know the cause of, since runnng on 64 bit means a lot larger segments, and thus that you dont collect a often.   IN other words you need to make sure that you have enough RAM to back your mem usage up if you run 64 bit.

  9. AaronS says:

    The problem with 64bit is that Debug Diag doesn’t work.

    Whenever you pull up a list of processes, it only displays the ones running in x86 mode, and none that are running in x64 mode (including IIS).

    I wish there was a work around for this.

  10. Pure Krome says:

    @AaronS: SAME!!! I just downloaded this to debug a memory leak and i noticed it only installs an (x86) version. So of course no x64bit version dll’s can be attached. I also couldn’t find any info about an x64 bit version on the MS download site.

    🙁  Unusable, but potentially kewl, product 🙁

  11. Tess says:

    There is a 64 bit version in the works.  

    If you work with support or with your TAM/ADC if your company has one, you can get the 64 bit version, but it is not public yet as it hasn’t gone through rigorous testing.

  12. RedCrystal says:

    Here’s a nifty tool that every enterprise ASP.NET programmer and ops person should know about: Download

  13. I recently got an email with the following question: “Can you give me some very helpful hints with this

  14. 最近博客园被HTTP Error 503 Service unavailable的问题困扰,博客(与社区(

  15. mga says:

    Tess – this tutorial was invaluable in helping me solve a problem.   Thank you!

  16. Walter says:

    Hi Tess,

    great article, thanks.

    I don’t even know what the acronyms TAM/ADC mean.

    I am an independant web software developer who could really do with a version of debugdiag to debug a server crash on Windows 2008 64 bit.

    Can you advise me of how I can get a copy please.



    walterlockhart at gmail dot com

  17. Tess says:

    TAM/ADC means Technical Account Manager or Application Development Consultant,  you have one if you have a premier contract with MS.

    Unfortunately the 64bit version is not publicly available yet but as soon as it is released it will be available for download on the MS download site.  If you need one before, create a support case with MS, or try pining the guys at

  18. Gavi Narra says:

    Thank you for the excellent article.

    Looks like there is a typo in this command

    .loadby sos mscorkws

    it should be

    .loadby sos mscorwks

  19. Tess says:

    Thanks Gavi,  corrected it in the post

  20. Paul McNamara says:

    This is great thanks. We were getting stack overflow exceptions but had no idea where from. This saved the day 😀

  21. Andy says:

    An windows Desktop Application (C #) monitored with DebugDiag crash without generating dump. Why does not generate debugdiag automatically dump with default settings (settings are as in your article). Adplus generates a dump in the same condition (default settings). How do I set debug diag to have a dump all time when application is crashed ?

  22. Tess says:

    Andy,  What was the reason for the crash?  I.e. what did adplus take a dump on (2nd chance exception, kernel32!TerminateProcess etc.)

    There are situations where adplus will get dumps but not debug diag (with default settings) like when you get a stackoverflow since that is a 1st chance exception causing a crash.  In that case you will have to set up debug diag to dump on sof

  23. andy_leca says:

    1. My desire is to be able to set the debug diag so that we not surprise that the application crash but does not generate dump. In my test from section 2 the DebugDiag crash without generating dump (the same test that I said in first message). Is it possible a configuration to ensure this scenario?

    2. The name of dump is XXX_1st_chance_Process_Shut_Down__full.

    See bellow adplus_report.txt.

      The debug diag does not generate dump in same crash condition(set up debug diag to dump on Stack Overflow and Access violation 1st chance exception)



    In_page_IO_error [ip]       return: GN GN

         1st chance: Log;Time;Stack;MiniDump

         2nd chance: Log;Time;Stack;FullDump;EventLog

     Invalid_system_call [isc]       return: GN GN

         1st chance: Log;Time;Stack;MiniDump

         2nd chance: Log;Time;Stack;FullDump;EventLog

     Stack_buffer_overflow [sbo]       return: GN GN

         1st chance: Log;Time;Stack;MiniDump

         2nd chance: Log;Time;Stack;FullDump;EventLog

    Starting to attach the debugger to each process

    Attaching to 7144 – XXX.SERVER.EXE

  24. vani says:

    i am getting same error as described in this article, my webserver is windows server 2008 R2, iis 7.5, i installed debug diag 1.1 but i couldnt find rules tab, how should i debug acc to above steps mentioned in the article.

  tess, this post just saved my life.  i get to sleep tonight for the first time all week.  thank you.  thank you very much.

  26. Master List of Exception Codes says:

    Tess – would you know of a site that has a master list of exception codes?  In your article you mentioned that 0xe053534f is a stack overflow exception.  How were you able to look this up?  Thanks.

  27. Jahav says:


    if you are debugging .NET 4 app, you now need to use “.loadby sos clr” instead of “.loadby sos mscorwks” (many years have passed).