.NET Debugging Demos Lab 2: Crash


It was nice to see that so many people downloaded the demo site already and checked out the lab instructions for the first lab, and thanks to Pedro for pointing out that the original demo site required .NET Framework 3.5… I’ve changed it now so the one that you can download from the setup instructions page should not require .Net Framework 3.5.  (Even though I would encourage you to download 3.5 and play around with it anyways:))

Here comes lab 2, a crash scenario on the BuggyBits site.  

Previous demos and setup instructions

Information and setup instructions
Lab 1: Hang
Lab 1: Hang – review 

Reproduce the problem

1. Browse to the reviews page http://localhost/BuggyBits/Reviews.aspx, you should see a couple of bogus reviews for BuggyBits

2. Click on the Refresh button in the reviews page. This will crash the w3wp.exe process (or aspnet_wp.exe on IIS 5) 

    Note: If you have Visual Studio installed a Just-In-Time Debugger message may pop up (just click no for the purposes of this excercise).    
    However since this message box will sit there and wait for user input in order to shut down the app you may want to
disable JIT debugging if you have visual studio 
    installed on a test system.

Examine the eventlogs

1. Open the Application and System eventlogs, the information in the eventlogs will differ based on the OS and IIS version you are running. Among other events you may
    have a System Event looking something like this…

Event Type:	Warning
Event Source:	W3SVC
Event Category:	None
Event ID:	         1009
Date:		2008-02-08
Time:		10:12:06
User:		N/A
Computer:   	MYMACHINE
Description:
A process serving application pool 'DefaultAppPool' terminated unexpectedly. The process id was '4592'. 
The process exit code was '0xe0434f4d'.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Q: What events do you see?

Q: What does the exit code 0xe0434f4d mean?

Q: Can you tell from the eventlogs what it was that caused the crash? 

Get a memory dump

1. Browse to the reviews page http://localhost/BuggyBits/Reviews.aspx again, but don’t click refresh

2. Open a command prompt and move to the debuggers directory and type in “adplus -crash -pn w3wp.exe” and hit enter

Q: A new window should appear on the toolbar, what is it? 

Q: What is the debugger waiting for? Hint: Check the help files for ADplus/crash mode in windbg

3. Reproduce the issue by clicking on the refresh button in the reviews page.

Q: What files got created in the dump folder?  Note: The dump folder will be located under your debuggers directory with the name crash_mode and todays date and time 

Open the dump in windbg

1. Open the dump file labeled 2nd Chance CLR Exception in windbg (file/open crash dump).  Note that this dump got created just before the 1st chance process shutdown.

Note: if you throw an exception (.net or other) you have a chance to handle it in a try/catch block.  The first time it is thrown it becomes a 1st chance exception and is non-fatal.  If you don’t handle the exception it will become a 2nd chance exception (unhandled exception) and any 2nd chance exceptions will terminate the process.

2. Set up the symbol path and load sos (see the setup instructions for more info)

 

In a crash dump, hte active thread is the one that caused the exceptions (since the dump is triggered on an exception).

Q: Which thread is active when you open the dump? Hint: check the command bar at the bottom of the windbg window.

Examine the callstacks and the exception

1. Examine the native and managed callstacks. 

kb 2000
!clrstack

Q: What type of thread is it?

Q: What is this thread doing?

2. Examine the exception thrown

!pe

Note: !pe/!PrintException will print out the current exception being thrown on this stack if no parameters are given

Q: What type of exception is it?

Note: In some cases, like this one where the exception has been rethrown, the original stacktrace may not be available in the exception.  In cases like this you may get more information if you find the original exception

3. Look at the objects on the stack to find the address of the original exception

!dso

Q: What is the address of the original exception

Hint: Look at your previous pe output to see the address of the rethrown exception.  Compare this to the addresses of the objects on the stack.  You should have multiple exceptions, a few with the rethrown exception address but one of the bottommost exceptions will be the original one (look for one with a different address).

4. Print out the original exception and look at the information and the callstack

!pe <original exception address>

Q: In what method is the exception thrown?

Q: What object is being finalized?

Note: you could actually have gotten this information by dumping out the _exceptionMethodString of the rethrown exception as well, but with !pe of the original exception you get the information in a cleaner way.

Q: Normally exceptions thrown in ASP.NET are handled with the global exception handler and an error page is shown to the user.  Why did this not occurr here?  Why did it cause a crash?

Examine the code for verification

1. Open Review.cs to find the destructor/finalizer for the Review class

Q: which line or method could have caused the exception

 

As an extra excercise you can also examine the disassembly of the function to try to pinpoint better where in the function the exception is caused

!u <IP shown in the exceptionstack>

 

Related posts

Creating dumps with Windbg and writing ADPlus Config files

ASP.NET 2.0 Crash case study: Unhandled exceptions

What on earth caused my process to crash?

.Net exceptions – Tracking down where in the code the exceptions occurred

 

Have fun debugging,

Tess

Comments (32)

  1. Dragos says:

    Thank you very much for the wonderful work. I was in big need of a tool like Windbg, it really helps with my work.

  2. Kevin says:

    Hi Tess,

    Excellent labs, looking forward to more.

    I have one question though. On a server running multiple sites (hundreds in fact) in the same app pool, how do I identify the site that caused a hang or crash having identified the root cause?

    Or to put it differently, how do I match up threads to IIS sites/applications in windbg?

    As a hoster we can be running up to 1000 sites on a single server with those sites divided across say 5-10 app pools.

    Cheers

    Kev

  3. Tess says:

    Hi Kev,

    Although i probably wouldn’t recommend running 200 apps per app pool because of how much memory usage there would be per process (likely OOMs just because of the dlls loaded alone) your question is very valid.

    The finalizer thread is common to all apps in the process but for all other threads you can check out the threads in !threads and check which appdomain the code is running in by running !dumpdomain on the domain in the domain column.

  4. GProssliner says:

    Hello!

    I have the following SOS output:

    0:000> !threadpool

    CPU utilization 100%

    Worker Thread: Total: 2 Running: 0 Idle: 2 MaxLimit: 25 MinLimit: 2

    Work Request in Queue: 0

    ————————————–

    Number of Timers: 3

    ————————————–

    Completion Port Thread:Total: 5 Free: 0 MaxFree: 4 CurrentLimit: 2 MaxLimit:

    25 MinLimit: 2

    It’s obvious, that the 100% CPU utilization is a problem (that has been solved already). My question is if and how the Threadpool used the current CPU utilization for scheduling WorkItems or to control how and if new Workerthreads are created (the Threadpool is primary used for async Socket Operations (HttpListener) within this project).

    It seems like no new threads are started, even if the current number of Threadpool Threads is below the Max-Threads (what could makes sence because there would be no resources available for the new thread).

    So … how can the values (Total, Running, Idle, MaxLimit and MinLimit) be interpreted?

    Any toughts?

  5. Tess says:

    Are you saying you are getting those numbers for this lab or in some other dump?  I’m just curious because you shouldnt get 100% CPU in this specific crash lab…

    The 100% is for the whole system, not only the w3wp.exe process so this would also include any CPU usage by other processes.

    Total = number of current worker threads started (running+idle)

    Running = executing a request or work item

    MaxLimit = max number of worker threads (as set in machine.config for asp.net or 1000 by default for winforms)

    MinLimit = 1 per logical CPU (min number of worker threads at any given time)

    The threadpool does take CPU usage into account, and currently it will not create new threads if the systems cpu usage is over 80%

  6. GProssliner says:

    Hello!

    First of all: No. This is not related to this lab. I just read the Review posting where you said that questions – even not directly related to the lab – are welcome.

    Thank you for sharing this information! The 80% threshold is hardcoded or configurable?

    Maybe you can also explain the Completion Port Thread values (Total, Free, MaxFree, CurrentLimit, MaxLimit and MinLimit) too? I’ve already search over and over the web but doesn’t find anything about them.

    Thank you!

  7. Tess says:

    Its totally cool to ask questions not related to the lab:)  just wanted to make sure that the lab didnt behave like that on your machine.

    The completion port threads are pretty much the same.  Completion ports are mostly used for callbacks but can be used for work items too if there are available completion port threads but no available worker threads.

    The 80% is hard coded but in reality there is no use to change it since you really can’t do much with new threads at that CPU level anyways.

  8. Andrew Lomakin says:

    Hello Tess!

    Great post, gives very important knowledge needed for newbie .NET crash analyst 🙂

    I have a situation where DFS management snapin is crashing due to a null-reference exception (0x80004003), and i’ve gone a long way to identify at what level the exception occurs (let me know if you’re curious enough to look at the dump – you should have my email somewhere hopefully). Eventually i came to the clr thread stack where the exception occurs, but i’m stuck, because i want to observe what parameters are passed to each funciton in the stack, but i went through your blog posts, and Johan’s, and i can’t seem to find a way to do this. Can you advice a little bit please? I’ve been banging my head against the table about this case for weeks now.

    Regards,

    Andrew

  9. Tess says:

    Hi Andrew,

    Yepp, I remember you:)  I can’t really commit to look at any dumps but I can give you some pointers.

    0x80004003 is not really a clr exception, and I am not sure based on your comment if you are actually stopped at the exception or just see it on the heap.  If you got it from the heap you won’t be able to inspect the parameters etc. so in that case you would have to set up debug diag or an adplus config file to get a full dump on 0x80004003.  Check the windbg help files for adplus config for more info on that…

    If you are stopped on the exception you can either use !clrstack -p to find the parameters or if that doesnt help you can try !dso to see the objects on the stack,

    Best of luck

    Tess

  10. André Nobre says:

    I decided to publish, every friday, some links that i judge interesting, from now. Architecture Scott

  11. TGIF, almost time for the weekend… but before you leave, here is lab 3. Todays debugging puzzle will

  12. Ed F says:

    Thanks a lot for these labs, I’m learning a lot. The assembler stuff was really neat to learn.

  13. Lab 1: HangLab 2: CrashLab 3: MemoryLab 4: High CPU Hang

  14. This is the last debugging lab in the .NET Debugging Demos series. By now you should have the basics

  15. We have reached the end of the .NET Debugging Demos series. And we are going to end it with a review

  16. baal says:

    Note: if you throw an exception (.net or other) you have a chance to handle it in a try/catch block.  The first time it is thrown it becomes a 1st chance exception and is non-fatal.  If you don’t handle the exception it will become a 2nd chance exception (unhandled exception) and any 2nd chance exceptions will terminate the process.

    what is it mean?  

    does the  1st exception can become the 2nd exception?

  17. Tess says:

    an exception is considered first chance when it is first thrown, at that point you can catch it in an exception handler.  If you don’t handle it it becomes a 2nd chance exception (unhandled)  see my post on questions about exceptions

    http://blogs.msdn.com/tess/archive/2008/04/01/questions-about-net-exceptions.aspx

  18. baal says:

    public void Page_load(……)

    {

        UserName1 = Request.Cookies["UserName1"].Value;  

        UserName2 = Request.Cookies["UserName2"].Value;  

    }

    if an Exception hannpen in the Page_Load, Yellow Screen  appeard.  the exception is first chance, how it become 2nd ?

  19. lindsay says:

    These labs are very interesting, cant wait to learn more!

  20. J.W. says:

    I have q question about this" any 2nd chance exceptions will terminate the process."

    If the global.asax has Application_Error , then any unhandled error will be handled there. I just feel if the process is terminated, all the people on the site will lost the session ( inproc session mode), it will be too bad.

  21. Justin says:

    .NET调试实例

    这是一个系列的调式实例,目的是为了帮助你在调式.NET应用程序中最常见的挂起(Hang)、性能(performance)、内存(memory)和系统崩溃(crash)方面获得一些…

  22. Tess says:

    J.W.  a 2nd chance exception is basically an unhandled exception.  If it is handled by the application_error it does not get to the 2nd chance stage, so it is still a handled exception (i.e. 1st chance).

    Any exceptions that occurr during requests will get handled by the application_error if they are not handled before that… however if you have an exception happening on the finalizer or on any other non-request thread, like a timer thread or similar, it will not be handled and thus be 2nd chance and crash the process…

    hope that makes sense, just think of it as in 1st chance exception = exception that can still be trapped and handled,  2nd chance exception = unhandled exception

  23. James says:

    Hey Tess,

    I have a web app that has a, thus far, undiagnosed crash.  I was hoping to use the steps in this lab to debug it, but it doesn’t seem to work.

    Firstly, there was only one dump file created named: PID-2148__W3WP.EXE_-OneStopCC-__1st_chance_Process_Shut_Down__full_0f20_2008-10-07_16-05-52-828_0864.dmp

    (AppPool name is OneStopCC)

    What does it mean when there is a crash that only creates the 1st chance/shut down dump and why does it not seem to have an exception in it (like in the lab)?

    !pe returns "The current thread is unmanaged"

  24. Tess says:

    You will get a 1st chance process shutdown when the process goes down for whatever reason, i.e. it could be that the process preemptively recycled (make sure to uncheck all recycling options), or that someone called iisreset, or that a fatal exception occurred that caused it to shutdown.

    The problem is that when the process is shutting down, the reason for the shutdown may already be gone… and in a shutdown dump you will typically just see the last thread that happened to finish in the dump, it is not neccessarily the faulting one.

    Check the eventlogs to make sure that it wasnt just caught due to a recycle or iisreset, and if it wasnt, then check the log that adplus outputs and follow Lab 5 to see how you can get dumps on the actual exception that came before the crash.  

    By default, adplus will only capture dumps on a few things like 2nd chance exceptions (unhandled exceptions), access violations, invalid handle etc.   so some things, like stackoverflow exceptions aren’t caught by default, but lab 5 will show you how to catch those.

  25. John says:

    Thanks for the lab- starting to understand some of this

  26. DB says:

    Hi Tess ,

    In System event log i am getting

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID:         1009

    Date: 11-11-2010

    Time: 10:44:28

    User: N/A

    Computer:   APP-01

    Description: A process serving application pool 'My App Pool' terminated unexpectedly. The process id was '3032'. The process exit code was '0xe0434f4d'.

    and Sometime i am getting this error in event log  for same process

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID:         1009

    Date: 30-07-2010

    Time: 16:19:32

    User: N/A

    Computer:   APP-01

    Description: A process serving application pool 'My App Poo' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '6192'. The data field contains the error number.

    So i followed the steps what you mentioned above and getting following output

    0:000> .load sos.dll

    0:000> kb 2000

    ChildEBP RetAddr  Args to Child              

    0006fc08 7c827d0b 77e61d1e 00000174 00000000 ntdll!KiFastSystemCallRet

    0006fc0c 77e61d1e 00000174 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

    0006fc7c 77e61c8d 00000174 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac

    0006fc90 5a364662 00000174 ffffffff 00000000 kernel32!WaitForSingleObject+0x12

    0006fca0 5a366e3f 00265020 5a3af42d 00000000 w3dt!WP_CONTEXT::RunMainThreadLoop+0x10

    0006fca8 5a3af42d 00000000 64711dcf 00000000 w3dt!UlAtqStartListen+0x2d

    0006fcb8 5a3bc335 01001418 010013e4 010012d0 w3core!W3_SERVER::StartListen+0xbd

    0006ff0c 0100187c 00000005 00263b28 00000000 w3core!UlW3Start+0x26e

    0006ff44 01001a27 00000005 00263b28 002644c0 w3wp!wmain+0x22a

    0006ffc0 77e6f23b 00000000 00000000 7ffd9000 w3wp!wmainCRTStartup+0x12f

    0006fff0 00000000 010018f8 00000000 78746341 kernel32!BaseProcessStart+0x23

    0:000> !clrstack

    succeeded

    Loaded Son of Strike data table version 5 from "C:WINDOWSMicrosoft.NETFrameworkv1.1.4322mscorsvr.dll"

    Thread 0

    Not a managed thread.

    0:000> !pe

    No export pe found

    0:000> !dso

    No export dso found

    Here what would be the problem? Why i am getting like this ? Actually our application developed by using VS 2003 and here some links is redirect to application developed in framework 2.0 .

    Thanks in advance.

  27. Tess says:

    DB, you are crashing due to an unhandled .net exception but for some reason your dump is not taken on a 2nd chance .net exception, but rather probably on an exit process so you managed to get a dump with only the main thread alive (but the main thread has nothing to do with it).  Look to see if perhaps you have two dumps in the dump folder…

    !pe probably didnt come until 2.0, can't remember now as it was a long time since I worked a 1.1 case as support for it is almost discontinued i believe… !dso should probably be !DumpStackObjects in that version, but either way you are not looking at a managed thread there…

  28. TheDON3k says:

    Is there any chance you can help me a bit with a .net 1.0 app that's occasionally crashing? The Event Viewer shows this as the only event: NET Runtime version 1.1.4322.2443- Setup Error: Failed to load resources from resource file. The site goes down momentarily and then comes back fine. Sessions are corrupted, so users have to login again. I've been monitoring the process w3wp.exe with iisstate, to try to figure it out, but since I'm not the developer of the app, nor do I normally deal with trying to solve this stuff, I'm a bit clueless. The only thing I see in the IISState constantly flying by are these: (fa8.828): CLR exception – code e0434f4d (first chance) in sets of about 4-5, but they are continuous. Process never seems to use more than 300-400 megs total and performance of the app seems fine until it bombs due to whatever this issue is. Win 2003 x86 OS. The vendor blames the OS. I doubt this, since I moved the app to a new box/os clean install, and have same issue. System is fully updated and patched. Thanks, if you can offer suggestions to me. I can provide any other info, if you need.

  29. DB says:

    Tess ,

    Thanks for the answer.  

    How you can say exactly  application is crashing due to an unhanded .Net exception?  Is there any methodology to find reason by looking event log ?

    Thanks

    DB                                                                            

  30. Tess says:

    Hi DB,

    The exit code gives it away  – 0xe0434f4d is the code for .net clr exception

  31. DB says:

    Hi Tess ,

    What this process exit code  '0xc0000354'  gives ?

    Sometimes we are getting the following error also

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID:         1009

    Date: 11-11-2010

    Time: 10:44:28

    User: N/A

    Computer:   APP-01

    Description: A process serving application pool 'My App pool' terminated unexpectedly. The process id was '3452'. The process exit code was '0xc0000354'.

    Thanks

    DB