HOWTO: Understand and Diagnose an Application Pool Crash


Problems statements similar to the following questions pop up all the time on various IIS newsgroups, and the user usually claims that they have either seen (or not seen) many posts that look like theirs, and never any concrete solutions. I am going to try and explain the whole thought process, why things work the way it does, as well as useful next steps.


Question:


#1


Our production server has recently started experiencing AppPool crashes. These seem to occur sporadically. sometimes three times a day, sometimes not for a couple of days. The error manifests itself to the client as “Service Unavailable”. In the system event log we see the following:
A process serving application pool ‘DefaultAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service With error number: 8007006d


At the same time (but not always) we see this error in the Application Log
“Faulting application w3wp.exe, version 6.0.3790.0, faulting module kernel32.dll, version 5.2.3790.0, fault address 0x000249d3”


The application does usually start working again after anywhere between 2 and 15 minutes (although no worker process restarts or recycles appear in our perf logs).


This problems seems to have started occuring following the latest MS patches being applied to our server. It may just be a coincidence though.


We are running Windows Server 2003 Standard. We are about to apply SP1 in an attempt to solve this problem, but I wanted to find out if anyone else has had this problem and what the solution was. I have seen several similarish posts but nothing concrete as a solution.


I have seen this article (http://support.microsoft.com/Default.aspx?id=885654) but it doesn’t quite match our situation as we are not running as a domain controller. We haven’t yet tried running registry monitor as I wanted to find out if  error 8007006d always equates to a registry permission problem?? before installing
3rd party freeware onto our production server.


Any help is much appreciated


#2


There is a similar question posted today, but the event ID is different


I see the following error anywhere from 2-3 times a day on Windows 2003 server.
Event ID- 1009, a process serving application pool ‘Q’ terminated unexpectedly. The process ID was ‘xxxx’ . The process code was ‘xxx’


So far we have not noticed anything on the application side, but once in a while W3WP.EXE starts using up all the CPU and the server comes to a halt. This is not associated with the application pool terminating, but wondering if it is conntributing it.


I have applied SP1 for Windows 2003, that did not make any difference.


Any help is apprciated.


#3


Hello Together!


I’ve got a problem with my Windows Server 2003 SP 1 Web-Edition


If I look in the event Viewer, I allways find the following Error:


Event Type: Error
Event Source: Application Error
Event Category: (100)
Event ID: 1000
Date:  28.08.2005
Time:  22:00:26
User:  N/A
Computer: {removed by the author}
Description:
Faulting application w3wp.exe, version 6.0.3790.1830, faulting module <some module name>, version <some numbers with dots>, fault address 0x<some numbers and letters a thru f>.


Why does this error occur? What is it and what can I do to resolve this problem?


Please help me


Answer:


You are looking at what is commonly refered to as a “crash” or “access violation” on the server. To be clear, this is different from a “hang” on the server, though the web browser may appear to “hang” or the browser’s icon motions for some time (the length of time depends on various network connection timeout periods as well as whether the server-side connection stays open or closed) before reporting some random error response relating to missing server, DNS, or other service error.


Crash vs. Hang


What distinguishes between a crash and a hang on the server? Ok, for the astute reader in the back, just be quiet with this simplification. We want folks to to internalize and understand the issue, not regurgitate dry documentation. 🙂


Practically speaking, a crash is something that will simply wipe out its host process; this will stop whatever server side work and response generation that the process was supposed to perform. Now, prior to IIS6, this same process also held the connection open, so as soon as a user-mode crash happens, IIS will go down and the client browsers see a disconnected connection and usually report some sort of service disconnected/not-found error. With IIS6 Worker Process Isolation Mode, HTTP.SYS holds the connections in kernel-mode, so regardless if user code running in w3wp.exe crashes, the connection stays connected and IIS starts up a new process to handle future requests – so browser clients will NOT see a disconnection for a crash. Unfortunately, the request caught by the crash cannot be re-executed (suppose that request was a bank withdraw…) unless you implemented Transaction semantics, so any unsent response is lost.


A hang, on the other hand, will usually keep its host process around but the hang prevents any real work from being done. There are many possible ways to do this – user code can be waiting for a lock that never gets released, either because it was leaked or there was a logical deadlock or livelock, or it could be in a clever infinite loop, etc. Once again, any unsent response will never happen once your code gets hung.


As you can see, from the perspective of the client browser, both a crash and a hang on the server can prevent a complete HTTP response from being sent back, so they can LOOK similar. Add to the fact that browsers may have bugs that cause itself to either crash and hang, and sound security practices on the server should limit information disclosure of errors to the client… so I never use browser behavior or returned HTML to diagnose server-side issues – I always diagnose based on information from various server log files.


About Diagnosing AppPool Failures…


Unfortunately, there are multiple event log entries from IIS that indicate a “crash” has happened, so you cannot just key-in on any particular event ID for a resolution pattern. For example, the earlier questions illustrate two such events, and there are other related ones.


Now, we did not intentionally try to make crashes harder to diagnose by making them appear as different events. What is happening is that the W3SVC component of IIS and the user-code execution component of IIS run in separate processes which execute independently of each other, yet asynchronously pass messages back and forth to indicate status. Suppose something crashes the process responsible for user-code execution…


Sometimes, IIS first notices it when a ping response fails to arrive; other times, IIS notices that the process handle has gone away… and while IIS understands that these are both catestrophic events that should be reported in the event log, it maintains good system design by reporting them uniquely. It is the responsibility of any analysis layers on top of IIS to abstract such detailed reporting and present a  logical “process crashed” information to the user so that they can take action.


Unfortunately, that analysis layer is frequently the user’s brain, who may not be able to abstract the details… and gets confused. But hey, I do not think that IIS should stop giving the details. On the contrary, I think you just have to look at the problem harder. 🙂 Or complain loudly and get us to provide a better debugging tool (like DebugDiag…).


What is a Crash


You can consider a crash as an unrecoverable event that resulted from some bug in the program that is executing – in the case of IIS, it simply provides a thin process and support infrastructure for your user code to run. And a bug is basically a logical flaw that results in unintended behavior given some arbitrary set of inputs. Notice that the behavior is unintended, and the set of causes is arbitrary. This basically means that bugs can cause crashes that happen sporadically or periodically… all depending on the set of causes, which is arbitrary!


Now, since the set of causes to a given bug is arbitrary, I would also caution against trying to “fix” crashes by blindly installing hotfixes, Service Packs, or making configuration changes. Crashes are caused by bugs, which are logical flaws, and the only way to “fix” the situation is to either:



  • fix the logical flaw itself, which requires diagnosing the crash to figure out the root of the problem.
  • change software configuration to avoid the logic flaw causing the crash, which can also require diagnosing the crash to figure out what is causing the issue.

To make things even more interesting… a variety of logic flaws can cause crashes, all of which look the same from an IIS and event log entry perspective (to IIS, the crash ended the process; does not matter what; so just report it).


Thus, you may see similar looking events, sometimes with similar looking error codes, but no single concrete solution. The reason should be clear to you now – one flaw causing a crash with a certain error code is NOT the same as another flaw causing a crash with an identical error code. You are talking about two different flaws, and depending on the code path taken by your server configuration choices, you may need to do different things.


So, the take-away here is that the event log entry, the error code, and any other details you may discern are simply good clues to what is going on, but none are independently reliable for diagnosis. Treat them as pieces of a puzzle that you need to put together to correctly diagnose the issue, which is ultimately what you are trying to identify and resolve.


For example, I treat these events as simply crashes that need to be caught and their stack trace logs analyzed to determine further action. I would not immediately pattern match crashes to “solutions” without other information, and I certainly would not change any system configuration in response.


Frankly, I think it is rash for users to attempt to resolve their crashes without diagnosing the cause. However, most users seem to love pattern matching problem symptoms and event log entries with supposed solutions and blindly try them all, hoping some might work… all the while sinking deeper into some other problem due to their random changes. And the rationale should be clear now – if you do not know the bug causing the fault, how can you determine the configuration change to avoid that bug’s path, or find the right patch to fix the bug?


How to approach the Crash


Ok, now that I have thoroughly trashed most people’s usual methods of “dealing” with a crash, let me walk through another troubleshooting pattern on IIS. 🙂  I realize that you are probably under the gun to take some action to fix a problem, so you are willing to try anything and if it works, great; if it doesn’t, then there is always product support or newsgroup/forums support to fall back on. I just want to propose a more actionable way to deal with a crash, so that you may be able to take care of things yourself… doesn’t that feel good? 🙂


Since you never notice a server crash until you interact with it (usually with a browser), when something unexpected happened and you do not think it is a bug in the application (“hanging” browser or server-error responses are reasonable clues), it is time to look for any signs of a crash on the server. When IIS6 runs in its default Worker Process Isolation Mode, you will see event log entries similar to the sort given in the questions – either IIS noticed the w3wp.exe handle signaled process exit/crash of some sort, or the w3wp.exe fails to respond to a ping, or Windows notices that a process crashed. In IIS5 Compatibility Mode, you will probably notice other event log entries, either saying that the IISADMIN or W3SVC service crashed and has done this for # times, or that dllhost.exe has crashed, etc. Anyways, all these events talk about something related to IIS crashing; you have no idea whether it is due to your code or not.


Now that you have identified that your issue belongs to a crash, it is time to set up debugging traps so that you can catch the NEXT crash and diagnose it. Yes, you heard me – you cannot do anything about the crash that has already happened – and since you have no idea what it is, you CANNOT change any server settings to avoid it. So, the only reasonable thing you can do is to set up debugging monitors like IIS State or DebugDiag on the necessary processes running code that is crashing and then WAIT for the next crash. Hopefully, you can trigger the failure condition easily, to shorten this wait.


On this next crash, these tools will produce a stack trace log as well as memory dumps to allow debugging, and you want to either analyze the stack trace log yourself, pay someone to perform an analysis (for example, Microsoft PSS) or post to a newsgroup like microsoft.publit.inetserver.iis to see if anyone will do a free analysis. Only after analysis of the crash can you determine what is truly going wrong


Hopefully, you only have one crash happening on your server, but even if there are multiple crashes on your server, you simply apply the same technique in serial. You catch one crash, resolve it, get a patch/fix, and run again with debugging enabled to catch the next crash, get it resolved and a patch/fix, etc… As developers will say, crashes are the most straight forward to diagnose and fix – they are truly the low-hanging-fruit of bad behavior on the server, so you should pick them off real early…


Now, I dissuade against tracking down multiple crashes in parallel because you simply do not know if the crashes are caused by the same or different bugs. If it is caused by the same bug, you are just wasting time diagnosing the other crashes. If it is caused by different bugs, you have no idea whether the bugs interact with each other or not in causing the crash. Ideally, you want to find and fix non-interacting bugs in parallel because interacting bugs may not be real bugs after you fix the original issue (so you would be just wasting resources again). In short, tracking parallel can be complicated and the pay-off is not certain… you have been warned!


Yes, this is very similar to how the IIS product team approaches bug fixing during our stress/reliability test runs. We start up all IIS processes under the debugger and monitor it for any problems, and as soon as anything crashes/hangs, the debugger is already there so we can diagnose the first occurrence instead of waiting for a second occurrence. And as soon as we get a fix, we just crank everything up again under debuggers and wait for the next failure. Highly efficient and no wasted effort.


Conclusion


I hope that this helps clarify what is a crash on IIS and how to best deal with it.


Resist your natural urge to pattern match event log messages or failure codes to a solution, and do not be discouraged if you do not find your particular failure code or if others have similar but supposedly unsolved issues. Crashes are usually arbitrary and requires stack trace analysis to determine the real cause and the next step. If you do not use tools like IIS State or DebugDiag to catch the crash, you will rarely figure out the real culprit and the correct next step. Pattern matching is nothing more than a random guess, so do not take chances.


Personally, I always attach a debugger to gather a stack trace whenever I suspect a crash. Depending on your debugging skills, this can tell you a whole lot of info, sometimes sufficient to directly fix the issue. Guessing at solutions based on non-specific symptomes can never do this reliably. It is all up to you. 🙂


//David

Comments (124)

  1. Ian Cox says:

    Great post David. I am one of those whose problems you quoted at the start. Everything you say here makes a lot of sense.

  2. Oliver Ogg says:

    Thanks for a well written and informative post.

  3. David Wang says:

    I had gotten a comment about this blog entry:

    Hi there, just read your blog here:

    http://blogs.msdn.com/david.wang/archive/2005/08/29/HOWTO_Understand_and_Diagnose_an_AppPool_Crash.aspx

    Would you say that this following error would fit into this category of a "crash" and your diagnosis techniques in this blog would be recommended? The worker process seems to restart, but I am consistently getting this crash. The primary software appliation I am running is called ACT, using a web front-end known as WiredContact.

    ERROR MESSAGE:

    Faulting application w3wp.exe, version 6.0.3790.1830, faulting module mfc42.dll, version 6.6.8063.0, fault address 0x00022659.

    –> Yes, this exactly matches one of the patterns I gave for what a "crash" would look like on IIS6.

    If you attach a debugger onto the w3wp.exe or use a tool like IIS State or DebugDiag it will immediately identify the code that is actually crashing when it happens, so you can start looking for support from the correct party.

    //David

  4. Kirk Francis says:

    Well done David, this information was well thought out and quite helpful to me. Like others, I’ve looked for a direct resolution to an Event Log message to no avail. Things are better with IIS 6.0 no doubt, but with that come the complexities of arriving at a root cause.

  5. Randy says:

    Well done David!! Hopefully DebugDiag will be a final release.. but until then I have found another helpful article which talks about running the debug symbols and what not

    http://support.microsoft.com/kb/911359/EN-US/

    Keep up the good work.. now about the IIS 7.0 Beta 😉

  6. David.Wang says:

    Woohoo… my second 10K+ blog entry…

    You all seem to like howto diagnosis entries. 🙂

    //David

  7. Hi David,

    Thanks for the information, I will try attaching the IIS debug tools and see if it helps to diagnose my issue.

  8. Ahmed El-Rasheedy says:

    Hi David,

    We are running JRun/ColdFusion Multi-Server Cluster (clustered using the Application server) on several Win 2003 machines. The machines that are having the error are running Windows 2003 SP1 while a third machine which does not have the Windows SP1 is not having the error. Here are the symptoms:

    Users get "Service Unavailable" in their browser and we see the following error and warnings in the EventViewer.

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID: 1011

    Date: 3/7/2006

    Time: 7:21:55 PM

    User: N/A

    Computer: SLIOBTWEB2

    Description:

    A process serving application pool ‘DefaultAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘5844’. The data field contains the error number.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 6d 00 07 80               m..?    

    Then after 5 warning message as above we get the below error. I found the number five in the IIS Application Pool (it is a setting to shut down w3wp.exe after 5 errors in f5 minutes)

    Event Type: Error

    Event Source: W3SVC

    Event Category: None

    Event ID: 1002

    Date: 3/7/2006

    Time: 7:22:03 PM

    User: N/A

    Computer: SLIOBTWEB2

    Description:

    Application pool ‘DefaultAppPool’ is being automatically disabled due to a series of failures in the process(es) serving that application pool.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    I captured a debug dump and I am not sure what to do with it or who should I submit it to. Can you please let me know.

    thanks,

  9. David.Wang says:

    Ahmed – as I mentioned in the blog entry:

    you want to either analyze the stack trace log yourself, pay someone to perform an analysis (for example, Microsoft PSS) or post to a newsgroup like microsoft.publit.inetserver.iis to see if anyone will do a free analysis.

    You have a classic "something is crashing in the application pool" and if you captured the stack trace of the crash (and not before/after the crash), you should be able to see in DebugDiag logfile what is at fault and who to follow up with.

    At this point, you only know that on WS03SP1 machines the issue can happen, but you do not know whether it is SP1 or some application’s reaction to SP1 that is at fault. The stack trace of the crash helps identify which situation.

    //David

  10. David Wang says:

    I recently sat down and thought a little about the typical user experience when troubleshooting IIS6,…

  11. Sudeep says:

    Hi David Please help me by telling how to proceed with my problem. I know why my IIS is crashing but i dont know how to fix it. Whenever i try to access a custom ISAPI dll from the browser, IIS is crashing, worker process identity is network service, IIS does not crash if i use local system account. Obviously it is an access violation problem. I am trying to give more access rights to network service rather than using local system account but with no success. I dont want my worker process identity to be local system account. Could u guide me how to go about giving only the required access to network service. Any help from anyone is appreciated. Thank you.

  12. Sudeep says:

    does anyone how to debug com dlls in vb environment for IIS 6 , windows server 2003? it is unlike 2000 where we unregister dll and run the project and put a breakpoint and can go ahead debugging. i could not find anywhere the architecture of how this is happening also…

  13. Chris Vandecaveye says:

    Hi David,

    Of course your article makes sense. Your article describes how I would normally handle a crash (or hang). But now I’m in a situation where I cannot apply this procedure. What is happening?

    We have an ASP.NET (1.1) application. We use a pool of 8 w3wp’s in IIS 6.0. Clients are using ie 5.5 to connect to the application. Every now and then, we cannot connect to the application anymore from certain clients while other clients are still working. Even more, starting the application on the webserver works fine, starting the same application on another server in that domain works fine. The clients do not get any message (blank white screen). There are no events in the eventlog. IIS is still working because we can still connect to other applications on the same webserver. We added a lot of debugcode in our application, but we don’t even reach the first line of code! So, the error must be situated between the incomming request and our application.

    I would greatly appreciate some help here, and yes we already placed a call at Microsoft without any result until now.

    I find it more important to get some suggestions on how to debug/monitor this issue, rather than have the solution without me nowing what I am doing.

    Thanks in advance!

  14. Chris Vandecaveye says:

    Hi David,

    Of course your article makes sense. Your article describes how I would normally handle a crash (or hang). But now I’m in a situation where I cannot apply this procedure. What is happening?

    We have an ASP.NET (1.1) application. We use a pool of 8 w3wp’s in IIS 6.0. Clients are using ie 5.5 to connect to the application. Every now and then, we cannot connect to the application anymore from certain clients while other clients are still working. Even more, starting the application on the webserver works fine, starting the same application on another server in that domain works fine. The clients do not get any message (blank white screen). There are no events in the eventlog. IIS is still working because we can still connect to other applications on the same webserver. We added a lot of debugcode in our application, but we don’t even reach the first line of code! So, the error must be situated between the incomming request and our application.

    I would greatly appreciate some help here, and yes we already placed a call at Microsoft without any result until now.

    I find it more important to get some suggestions on how to debug/monitor this issue, rather than have the solution without me nowing what I am doing.

    Thanks in advance!

  15. David.Wang says:

    Chris – I would look at HTTPERR logs for the affected client’s IP to see if the request from the affected client even got to the web server machine, HTTP.SYS, and IIS6.

    http://blogs.msdn.com/david.wang/archive/2005/12/31/HOWTO_Basics_of_IIS6_Troubleshooting.aspx

    If the request got to the web server and HTTP.SYS did not report it dropped in HTTPERR logs, then we need to isolate which w3wp.exe is having issues with it. You are using a pool of 8 w3wp.exe which makes it impossible to track down which w3wp.exe is handling the failing request (HTTP.SYS round-robins requests by connection when using Web Garden).

    You can either attach debugging tools to all w3wp and hope for something to happen that you can identify with the debugger, or make the Application Pool use only one w3wp.exe.

    Using one w3wp.exe will affect availability since other applications using the same Application Pool will also be out-of-commission, so please adjust debugging accordingly.

    //David

  16. Chris Vandecaveye says:

    Thanks for your reply David.

    I will suggest using a second server for availability purpose so we can use one w3wp.exe instead of the pool of 8.

    Some thing I do notice but i do not have the knowledge to interprete it or pinpoint the problem.

    – In the http log I see a lot of ‘Timer_ConnectionIdle’ and sometimes ‘Connection_Dropped’ but the application is still working. Is this normal? I will pay special attention to these logs in case of connection problems.

    – The use of virtual memory for each w3wpc.exe is high +/- 400MB but stable, while the available physical memory continues to drop from +/- 700MB available to 5MB available. When physical memory is at 5MB available, it goes back to 700MB available.

  17. David.Wang says:

    Chris – Explanation of HTTPERR Log entries:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/http/http/types_of_errors_logged_by_the_http_server_api.asp

    Sounds like some sort of process recycling is kicking in to free up available physical memory.

    //David

  18. Sudeep says:

    does anyone how to debug com dlls in vb environment for IIS 6 , windows server 2003? it is unlike 2000 where we unregister dll and run the project and put a breakpoint and can go ahead debugging…. am i missing something?

  19. Sudeep says:

    I moved asp website from 2000 to 2003, and my iis server was crashing, upon using debug diagnostics tool, it says "first chance exception which is Visual Basic Runtime exception with VB Error #:   53 ….

    Also ntdll.dll is crashing… can anyone tell me why this is happening or how to debug vb dll in 2003 as this exception does not occur on a 2000 machine?

    here is the stack trace for your perusal………… this is when ntdll.dll crashed due to access violation…….

    Entry point   w3tp!THREAD_MANAGER::ThreadManagerThread

    Create time   3/22/2006 1:45:14 PM

    Time spent in user mode   0 Days 0:0:1.656

    Time spent in kernel mode   0 Days 0:0:0.62

    Function     Arg 1     Arg 2     Arg 3   Source

    ntdll!RtlpLocateActivationContextSection+c     00070000     00000000     00000002    

    ntdll!RtlpFindNextActivationContextSection+64     00d3308c     00d330c4     00d330b0    

    ntdll!RtlpFindFirstActivationContextSection+41     00d3308c     00d330c4     00d330b0    

    ntdll!RtlFindActivationContextSectionString+91     00000000     00000000     00000002    

    ntdll!LdrpCheckForLoadedDll+f7     000a8990     00d33208     00000000    

    ntdll!LdrpLoadDll+1b3     00000000     000a8990     00d33724    

    ntdll!LdrLoadDll+198     000a8990     00d33724     00d33704    

    kernel32!LoadLibraryExW+1b2     7ffd5c00     00000000     00000000    

    kernel32!LoadLibraryExA+1f     01b03118     00000000     00000000    

    kernel32!LoadLibraryA+b5     01b03118     00d33c88     00d33794    

    RtCppUtils!ImgHelp::ImgHelp+6f     01a3a5d0     00000000     00000000    

    RtCppUtils!RTCppLog::RTCppLog+f0     c0000005     00d33cd0     e06d7363    

    msvcrt!_CallSETranslator+97     00d33eec     00d342e0     00d33f08    

    msvcrt!__CxxExceptionFilter+1d1     00d33eec     00d342e0     00d33f08    

    msvcrt!__CxxExceptionFilter+58f     00d33eec     00d342e0     00d33f08    

    msvcrt!__InternalCxxFrameHandler+b2     00d33eec     00d342e0     00d33f08    

    msvcrt!__CxxFrameHandler+28     00d33eec     00d342e0     00d33f08    

    ntdll!ExecuteHandler2+26     00d33eec     00d342e0     00d33f08    

    ntdll!ExecuteHandler+24     00d33000     00d33f08     00d33eec    

    ntdll!KiUserExceptionDispatcher+e     00d33000     00d33f08     00d33eec    

    ntdll!RtlEnterCriticalSection+19     00000020     00d34228     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d34acc    

    msvcrt!fprintf+18     00000000     01b010c8     01621d69    

    RtCppUtils!RTBaseLog::logMessage+17e     00d34420     00d3438c     00250168    

    RtCppUtils!RTCppLog::log+dd     00000001     00d34420     00d34788    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d34420     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d348b0     e06d7363    

    msvcrt!_CallSETranslator+97     00d34acc     00d34ec0     00d34ae8    

    msvcrt!__CxxExceptionFilter+1d1     00d34acc     00d34ec0     00d34ae8    

    msvcrt!__CxxExceptionFilter+58f     00d34acc     00d34ec0     00d34ae8    

    msvcrt!__InternalCxxFrameHandler+b2     00d34acc     00d34ec0     00d34ae8    

    msvcrt!__CxxFrameHandler+28     00d34acc     00d34ec0     00d34ae8    

    ntdll!ExecuteHandler2+26     00d34acc     00d34ec0     00d34ae8    

    ntdll!ExecuteHandler+24     00d34000     00d34ae8     00d34acc    

    ntdll!KiUserExceptionDispatcher+e     00d34000     00d34ae8     00d34acc    

    ntdll!RtlEnterCriticalSection+19     00000020     00d34e08     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d356ac    

    msvcrt!fprintf+18     00000000     01b010c8     01621ad9    

    RtCppUtils!RTBaseLog::logMessage+17e     00d35000     00d34f6c     00250164    

    RtCppUtils!RTCppLog::log+dd     00000001     00d35000     00d35368    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d35000     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d35490     e06d7363    

    msvcrt!_CallSETranslator+97     00d356ac     00d35aa0     00d356c8    

    msvcrt!__CxxExceptionFilter+1d1     00d356ac     00d35aa0     00d356c8    

    msvcrt!__CxxExceptionFilter+58f     00d356ac     00d35aa0     00d356c8    

    msvcrt!__InternalCxxFrameHandler+b2     00d356ac     00d35aa0     00d356c8    

    msvcrt!__CxxFrameHandler+28     00d356ac     00d35aa0     00d356c8    

    ntdll!ExecuteHandler2+26     00d356ac     00d35aa0     00d356c8    

    ntdll!ExecuteHandler+24     00d34000     00d356c8     00d356ac    

    ntdll!KiUserExceptionDispatcher+e     00d34000     00d356c8     00d356ac    

    ntdll!RtlEnterCriticalSection+19     00000020     00d359e8     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d3628c    

    msvcrt!fprintf+18     00000000     01b010c8     01621849    

    RtCppUtils!RTBaseLog::logMessage+17e     00d35be0     00d35b4c     00250168    

    RtCppUtils!RTCppLog::log+dd     00000001     00d35be0     00d35f48    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d35be0     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d36070     e06d7363    

    msvcrt!_CallSETranslator+97     00d3628c     00d36680     00d362a8    

    msvcrt!__CxxExceptionFilter+1d1     00d3628c     00d36680     00d362a8    

    msvcrt!__CxxExceptionFilter+58f     00d3628c     00d36680     00d362a8    

    msvcrt!__InternalCxxFrameHandler+b2     00d3628c     00d36680     00d362a8    

    msvcrt!__CxxFrameHandler+28     00d3628c     00d36680     00d362a8    

    ntdll!ExecuteHandler2+26     00d3628c     00d36680     00d362a8    

    ntdll!ExecuteHandler+24     00d35000     00d362a8     00d3628c    

    ntdll!KiUserExceptionDispatcher+e     00d35000     00d362a8     00d3628c    

    ntdll!RtlEnterCriticalSection+19     00000020     00d365c8     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d36e6c    

    msvcrt!fprintf+18     00000000     01b010c8     016215b9    

    RtCppUtils!RTBaseLog::logMessage+17e     00d367c0     00d3672c     00250168    

    RtCppUtils!RTCppLog::log+dd     00000001     00d367c0     00d36b28    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d367c0     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d36c50     e06d7363    

    msvcrt!_CallSETranslator+97     00d36e6c     00d37260     00d36e88    

    msvcrt!__CxxExceptionFilter+1d1     00d36e6c     00d37260     00d36e88    

    msvcrt!__CxxExceptionFilter+58f     00d36e6c     00d37260     00d36e88    

    msvcrt!__InternalCxxFrameHandler+b2     00d36e6c     00d37260     00d36e88    

    msvcrt!__CxxFrameHandler+28     00d36e6c     00d37260     00d36e88    

    ntdll!ExecuteHandler2+26     00d36e6c     00d37260     00d36e88    

    ntdll!ExecuteHandler+24     00d36000     00d36e88     00d36e6c    

    ntdll!KiUserExceptionDispatcher+e     00d36000     00d36e88     00d36e6c    

    ntdll!RtlEnterCriticalSection+19     00000020     00d371a8     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d37a4c    

    msvcrt!fprintf+18     00000000     01b010c8     01621329    

    RtCppUtils!RTBaseLog::logMessage+17e     00d373a0     00d3730c     00250168    

    RtCppUtils!RTCppLog::log+dd     00000001     00d373a0     00d37708    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d373a0     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d37830     e06d7363    

    msvcrt!_CallSETranslator+97     00d37a4c     00d37e40     00d37a68    

    msvcrt!__CxxExceptionFilter+1d1     00d37a4c     00d37e40     00d37a68    

    msvcrt!__CxxExceptionFilter+58f     00d37a4c     00d37e40     00d37a68    

    msvcrt!__InternalCxxFrameHandler+b2     00d37a4c     00d37e40     00d37a68    

    msvcrt!__CxxFrameHandler+28     00d37a4c     00d37e40     00d37a68    

    ntdll!ExecuteHandler2+26     00d37a4c     00d37e40     00d37a68    

    ntdll!ExecuteHandler+24     00d37000     00d37a68     00d37a4c    

    ntdll!KiUserExceptionDispatcher+e     00d37000     00d37a68     00d37a4c    

    ntdll!RtlEnterCriticalSection+19     00000020     00d37d88     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d3862c    

    msvcrt!fprintf+18     00000000     01b010c8     01621099    

    RtCppUtils!RTBaseLog::logMessage+17e     00d37f80     00d37eec     00250168    

    RtCppUtils!RTCppLog::log+dd     00000001     00d37f80     00d382e8    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d37f80     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d38410     e06d7363    

    msvcrt!_CallSETranslator+97     00d3862c     00d38a20     00d38648    

    msvcrt!__CxxExceptionFilter+1d1     00d3862c     00d38a20     00d38648    

    msvcrt!__CxxExceptionFilter+58f     00d3862c     00d38a20     00d38648    

    msvcrt!__InternalCxxFrameHandler+b2     00d3862c     00d38a20     00d38648    

    msvcrt!__CxxFrameHandler+28     00d3862c     00d38a20     00d38648    

    ntdll!ExecuteHandler2+26     00d3862c     00d38a20     00d38648    

    ntdll!ExecuteHandler+24     00d37000     00d38648     00d3862c    

    ntdll!KiUserExceptionDispatcher+e     00d37000     00d38648     00d3862c    

    ntdll!RtlEnterCriticalSection+19     00000020     00d38968     77bd1d69    

    msvcrt!_lock_file+33     00000000     01a3a5d0     00d3920c    

    msvcrt!fprintf+18     00000000     01b010c8     01620e09    

    RtCppUtils!RTBaseLog::logMessage+17e     00d38b60     00d38acc     00250168    

    RtCppUtils!RTCppLog::log+dd     00000001     00d38b60     00d38ec8    

    RtCppUtils!RTCppLogPtr::log+1e     00000001     00d38b60     01a3a5d0    

    RtCppUtils!RTCppLog::RTCppLog+1f1     c0000005     00d38ff0     e06d7363    

    msvcrt!_CallSETranslator+97     00d3920c     00d39600     00d39228    

    msvcrt!__CxxExceptionFilter+1d1     00d3920c     00d39600     00d39228    

    msvcrt!__CxxExceptionFilter+58f     00d3920c     00d39600     00d39228    

    msvcrt!__InternalCxxFrameHandler+b2     00d3920c     00d39600     00d39228    

    msvcrt!__CxxFrameHandler+28     00d3920c     00d39600     00d39228    

    ntdll!ExecuteHandler2+26     00d3920c     00d39600     00d39228    

    ntdll!ExecuteHandler+24

    and so on…………..

  20. David.Wang says:

    Sudeep – looks like something is in infinite recursion. In which case it is less interesting that ntdll.dll crashed (when you see ntdll.dll "crash" you should always think "what did I do wrong to cause it" and not "it’s Microsoft’s problem that ntdll.dll crashed").

    Since you are using VB, you must make sure to get the latest VB Runtime and patches because otherwise VB doesn’t work properly when launched by IIS6 on Windows Server 2003.

    Debugging Principles hasn’t changed on Windows Server 2003. Procedures may have changed since process models change, but that’s a fact of life. The Debugging Principles haven’t changed since NT 3.51.

    //David

  21. Sudeep says:

    Absolutely i agree with you david, I bet there is a problem with one of the dlls we are using… i have run debug diagnostics tool a lot of times, what i get to see is there is a first chance exception occuring in one of the dlls used by us everytime we access the page which uses that dll…  

    does a first chance exception handled properly by the application also get notified by the debugger? or does it mean there is definitely an unhandled exception in our vb dll? also is there a chance that many first chance exceptions eventually result in a crash ? or we can be sure that first chance exceptions never crash IIS?  

    Should i treat exception occuring due to vb dll (first chance exception) and the other exception which is crashing ntdll (second chance exception) as seperate exceptions or is there a chance they are interlinked?

  22. Sudeep says:

    Also David, I have the latest sp which is sp6 for visual studio 6.0 and latest vb runtime installed on my machine and still facing the problem of not being able to debug vb com dll and also i have sp1 for my server 2003 os which is the latest.

    Are you referring to any specific patch that will do the job? do you have the url to the patch you are referring to?  

  23. David.Wang says:

    Sudeep – I would not speculate about any of the exceptions seen in the debugger other than the unhandled one which will cause the debugger to break in. Exceptions are a normal part of computing, and handled ones are usually ok (at least, the code handling it thinks it is ok). For example, when you try/catch an exception you will see it show up as first-chance – nothing wrong with that.

    So, you just need to care about the unhandled exceptions, and if you are paranoid, any handled exceptions that are "swallowing up" otherwise unhandled exceptions and preventing early diagnosis. But, there is little you can do about that one.

    As for the exceptions – you have no way to determine if they are linked or not. You can only debug the crash, determine what criteria caused that crash, and go backwards from there.

    Second-chance exception means that someone already caught the first-chance exception but did not handle it – which eventually leads to second chance.

    I do not know if Visual Studio 6.0 works on Windows Server 2003. It is certainly no longer a supported product.

    I have never debugged Visual Basic. I just know that the VB Runtime had some hacks in it to turn on RIM and UE for badly written VB COM components that would otherwise hang a server like IIS. The hack expectedly breaks on IIS6 because of the new process model (process name is w3wp.exe and not inetinfo.exe or dllhost.exe), so it had to be re-hacked to work with IIS6. I don’t know if you have that version or not, and I don’t know where you’de get it. Sorry – I just don’t work with VB much.

    //David

  24. TJ says:

    Great article. I attached the debugger and was able to generate a dump from the URL that was experiencing the hang – however, I am unable to open the dump:

    IISAnalysis.asp  Could not open specified dump file.

    Any thoughts?

  25. David Wang says:

    Sigh… it seems that the Application Health Monitoring features added in IIS6 are merely used by VARs…

  26. Matthew says:

    What should I do if DebugDiag crashes everytime I try to install it???  :>

    No really, that is happening to me.  The only answer at this point is to format c:

  27. David.Wang says:

    Matthew – are you sure that DebugDiag is actually crashing when you install it?

    The installer does not invoke DebugDiag binaries in any way, so I do not see how it can crash, and installer programs themselves rarely crash.

    Since the installer is an MSI, you can always launch it as:

    msiexec.exe /I iisdiag.msi /L*v iisdiagInstall.log

    And view the verbose installation log file iisdiagInstall.log afterwards to see where the install failure happened.

    Sure, you can always format c: . Debugging and Troubleshooting is not for everyone, and if you can rebuild server state faster than you can troubleshoot, debugging is irrelevant.

    //David

  28. gavin milward says:

    it this similar situations when the network service does not have necessary permissions. I use regmon from sysinternals to monitor registry access.

    filter it for the w3wp.exe service and highlight access deinied errors

    when you get access denied from the network services, this usually happens with access to certificates. open the registry key and add read permissions for the network service account.

    then once all accessd enieds are found schedule an iis restart to pick up the new permissions.

    This fixed all my similar problems but just recently an update patch reset all my permissions for certificates, so I am yet to decide whether the network service is the best to use.

  29. David.Wang says:

    gavin – I think your problem comes from some application running on IIS that is accessing certificates – because IIS itself (and the process identity, Network Service by default) does not have any need to access certificates.

    //David

  30. tyrven says:

    Unfortunately, IIS Diag requires the Windows 2003 DDK which is no longer available for public download, but now requires a MSDN subscription (or you have to buy the CD and wait for it to arrive).  Annoying.  

  31. David Wang says:

    Question:

    Ok, I read through David Wang’s Troubleshooting crashes thing and got the DebugDiag and I…

  32. Oded says:

    We have experienced the same problem:

    Event Type: Error

    Event Source: W3SVC

    Event Category: None

    Event ID: 1002

    Date: 7/19/2006

    Time: 3:15:55 AM

    User: N/A

    Computer: xxx

    Description:

    Application pool ‘DefaultAppPool’ is being automatically disabled due to a series of failures in the process(es) serving that application pool.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID: 1011

    Date: 7/19/2006

    Time: 3:15:55 AM

    User: N/A

    Computer: xxx

    Description:

    A process serving application pool ‘DefaultAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘1700’. The data field contains the error number.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    After a thourough investigation, we have realizes that it was the result of DDoS (distributed denial of service) attacks, which where launched on our site every now and then.

    No solution yet…

  33. John Smith says:

    I am very sorry to say this, David. But this is one of the worst technical article I have ever read in my life.

    You could say these words in one page, but you used six, which makes this article just a waste of time.

  34. Ruth Horvath says:

    I just found this thread extremely interesting. We too have an application running on Windows Server 2003 SP1 using IIS 6.  I’ve been trying to run load on our application. At first, I saw that the w3wp.exe process memory usage escalated, even if we only opened 50 windows running the application and closed them, the memory usage still climed. Then I found out about a "known" memory leack and "known" null pointer exception. Oh well…

    Now I’m trying to run load again (that I know had worked a few weeks back) and things are worse. I used to be able to login 1000 users to the application and run for 10 hours. Now I don’t even last 20-30 minutes and only get in 200-400 users. In looking at the events, I see the Faulting application w3wp.exe….., The VB Application identified by the event source logged this Application CPU_Mon…. there is some other stuff, but then the whole server reboots. I’ve just downloaded the DebugDiag tool and set up rules for hangs and crashes – but it doesn’t look like I have that right; I do have a w3wp…txt file created, but inside there’s an error about the Symbol file could not be found and the stack unwind information not available…. HELP !

  35. Krish says:

    Sorry David this is not for you.

    Few words for John Smith,

    John, Genius like you can understand things in few words, but people like us need some explanation. So If you can’t encourage forums like this please don’t discourage. I would have appreciated if you could have posted that one page of your’s here.

    Also I don’t want to write something like this here, but I don’t want to see another john discouraging like this.

    David please continue your great work.

  36. rightioho says:

    so… i get the same issue with MICROSOFT sharepoint portal services,… running on MICROSOFT windows operating system,… and web service provided by MICROSOFT IIS 6.0,… supplemented with MICROSOFT .NET 1.1,… and MICROSOFT COM+ services… there seems to be a common occurence here…

  37. Yatish says:

    This info helped me alot , while resolving some sticky issues.I am working as tech support engineer and whnever I got any windows issue I prefer this site and mostly your blogs.

  38. Ayoob says:

    Excellent post David, straight to the point it helped a lot, troubleshooting and debugging waaay more fun than just apply random fixes,  format 😉

  39. Manpreet says:

    Is Installing DebugDiag on server going to be having any performance issues when i use the debugger with the application? how much is the setback going to be?

  40. amit says:

    And what is your suggestion when devs dont have access to production servers to install debuggers? When production servers are handled by a "production support" team who are clueless about these things but only know to guard the servers against any modifications, with their life. So that they don’t have to do any extra work. With the UAT server in my control, I don’t see the problem , because its not loaded properly or because the combination of events that causes a crash in the real world doesn’t occur.

    Is it too much work to add a dump capability so IIS can WRITE A  CRASH LOG when it crashes??

  41. David.Wang says:

    Manpreet – there is a (usually small) performance degredation, but it varies depending on how much debug output the application produces.

    It is customary practice to not run debuggers on production machines, so it should only be set up to help debug your issue and then removed.

    //David

  42. David.Wang says:

    amit – my suggestion would be to reproduce the problem on a controlled test-server. If you cannot reproduce the issue in test, then that illustrates an inadequacy in your testing procedures.

    Since NT4, Windows has always had Dr. Watson, which will write a crash dump for any process, including IIS, when it exits due to an unhandled exception.

    Thus, IIS will never add capability to privately capture and write a crash log when code inside of it crashes. It is a web server, not a development/debugging platform. It simply does not make sense to duplicate any of the existing debugging infrastructure that the user merely needs to configure to use.

    It sounds like your frustration is more with the production support team, not with IIS nor debugging facilities.

    However, relying on this mechanism assumes that the issues of interest are only crashes (unhandled exceptions) and that the issue survises until JIT debugger activates Dr. Watson.

    If you need to debug something else, then you need to attach a debugger and wait for the right exception.

    //David

  43. Sam says:

    OMG, when will you people learn! Why not get a "real" webserver so you don’t have to go through this agony. For Christs sakes, "Attach a debugger and wait" because that’s all you can do??? Screw IIS!!

  44. Blitzenn says:

    Sam, that’s like say you should stop being stupid and get a push mower because the tire is flat on your rider.  

    Can your Web server automatically dump me to a standby data server when it recognizes the first one is not available and determine that both are synchronized across 60 + databases all in real time and do all of that out of the box (no code required)?  I think not.  If you want to bash things, please research your argument prior to spouting off.  

  45. Jack says:

    David’s article is not right to the point, he is trying to by pass the issue and not knowing how to fix it, it’s a wast of time reading his article!

  46. David.Wang says:

    Jack – you are absolutely correct. This article is not about how to fix a crash. It is about understanding and diagnosing a crash so that one can assign responsibility to get that crash fixed.

    If you are looking for ways to fix the crash, the answer is simple — identify the cause of the crash, which is usually either user configuration error or software bug, and either get the right configuration setting or get the source code/compiled binary for the bug fix.

    There are many causes for a crash, so it is quite useless to search for how to fix a crash. You need to first identify the cause of the crash to narrow down the search for a fix. That is exactly what this blog entry tries to teach.

    Most users just want their issue fixed but unwilling to learn how to fix their problems. It is the proverbial "give a man a fish, he avoids hunger that night. Teach a man to fish, he is never hungry again". I’m sorry, but I am only teaching how to fish. You will have to compensate me to give you a fish.

    //David

  47. Rich says:

    Hi David,

    Just wanted to say thanks very much for this article, it was just what I needed to track down and fix a bug with a 3rd party app on a clients server and I am sure the methodology is someting that I will use again and again. Have passed this link onto all our engineers.

    Many thanks

    Rich

  48. Brian Milinazzo says:

    This seems like the logical approach to take.  I am also getting the application pool error, so I have installed the Debug Diag tool and created a generic crash rule but I am not sure what specific exceptions to list in the advanced settings if any at all is this.

    Can someone post a setup that should catch the error and provide the debuging information?

    Background:

    This error is occuring once every two weeks and requires a complete restart of the server.  Restarting IIS and the appliations do not work the website remains unavailable.  After the server comes backup we get the application pools errors on the console and the info messages are generated in the event log.

    The error log before the restart is:

    Faulting application w3wp.exe, version 6.0.3790.1830, faulting module cognosisapi.dll, version 8.2.1640.0, fault address 0x00004cfc.

    Info logs is basically the same it just has the "report queued message" at the beginning:

    Reporting queued error: faulting application w3wp.exe, version 6.0.3790.1830, faulting module cognosisapi.dll, version 8.2.1640.0, fault address 0x00004cfc.

    Thank you,

    Brian

  49. Hong says:

    Hi David,

    I am also getting the same application pool error on production server which is not really accessible to dev. However, I have a minidumper running in my application dll loaded by w3wp. Previously when the app dll crashed, we got minidump files for debugging. but this time with this "faulting application w3wp.exe…" error, no minidump files has been generated.

    So I was wondering if there is any separate IIS process running, in additional to the one running my user app dll, that could cause the Application Pool Crash error. If that’s the case, the crashing process is out of the control of my minidumper, as the dumper only sits in my app dll and monitors the "current process" running the app dll.

    How many processes are started for IIS in general? Is there only single one, w3wp.exe?

    Thanks,

    Hong

  50. David.Wang says:

    Hong – IIS6 has two different runtime modes that affect this answer.

    In IIS5 Compatibility Mode, user code can be running in inetinfo.exe if Low Isolation, some dllhost.exe if Medium Isolation, and any number of dllhost.exe if High Isolation.

    In IIS6 Worker Process Isolation mode, all user code run inside of a w3wp.exe process, and the number of such w3wp.exe processes depend on the number of functional Application Pools as well as each Application Pool’s "Maximum Processes" setting.

    There are no other IIS processes running user code that can crash w3wp.exe like that. I suspect your minidumper was not activated or functional for the w3wp.exe process that actually crashed.

    //David

  51. Pichitchai says:

    Hi David  

    I want to know

    Why this error occur ?

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID: 1011

    and

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID: 1009

    It occurs often but IIS6.0 not crash immediate

  52. Praveen says:

    Hi David,

    We aree facing the similiar problem posted here.. with the following error message.

    A process serving application pool ‘AppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘5448’. The data field contains the error number.

    I used DebugDiag tool to debug and analyzed the dump file using the same tool. and I am seeing the Memory Heap corruption errors while connecting to some COM DLLs. (SAS 9.1.3)

    In w3wp__PID__4940__Date__12_03_2007__Time_07_23_17AM__422__Second_Chance_Exception_C0000005.dmp the assembly instruction at ntdll!ExpInterlockedPopEntrySListFault in C:WINNTsystem32ntdll.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to read from memory location 0x0004000d on thread 52

    When I click on thread 52 it takes me to the following.

    Thread 52 – System ID 488

    Entry point   SASWMan!DllUnregisterServer+b95d

    Create time   12/3/2007 7:23:15 AM

    Time spent in user mode   0 Days 0:0:0.0

    Time spent in kernel mode   0 Days 0:0:0.0

    Function     Arg 1     Arg 2     Arg 3   Source

    ntdll!ExpInterlockedPopEntrySListFault     04ff08f8     072ff4fc     7c82a0b8    

    ntdll!RtlAllocateHeap+14e     04ff08f8     05c8d3e0     0000005c    

    ntdll!RtlAllocateHeap+e2     04ff0000     00000000     0000005c    

    SASComb!DllCanUnloadNow+d705     0000005c     1001139a     0000005c    

    SASComb!DllCanUnloadNow+d727     05c89ab0     066183ec     00000001    

    SASComb+1d8f     05c89ab0     066183ec     065facac    

    SASWMan!DllUnregisterServer+89ef     072fff2c     04e74e60     04e74ce0    

    SASWMan!DllUnregisterServer+7ec0     0607ec44     00000000     00000000    

    SASWMan!DllUnregisterServer+b9bc     04e74d80     00000000     00000000    

    kernel32!GetModuleHandleA+df     04e45388     04e74d80     00000000    

    and the heap corruption is as below

    Heap 58 – 0x04ff0000

    Heap Name   SASComb!DllCanUnloadNow+149924

    Heap Description   This heap is used by SASComb

    Reserved memory   1.06 MBytes

    Committed memory   596.00 KBytes (54.78% of reserved)  

    Uncommitted memory   492.00 KBytes (45.22% of reserved)  

    Number of heap segments   2 segments

    Number of uncommitted ranges   1 range(s)

    Size of largest uncommitted range   492.00 KBytes

    Calculated heap fragmentation   0.00%

    The DLLs listed above SASomb and SASWman are SAS DLLs and I am not sure whether this error is related to SAS or .netframework 2.0 or IIS or…

    We already contacted SAS Support and couldn’t get much help from them and all SAS DLLs are latest ones which we are using.

    We are connecting to SAS on Unix server from .Net 2.0 framework using ASP.Net (using VB.Net) with IIS 6.0 on webserver.

    Before migrating to 2.0 (in 1.1) we didn’t face this issue, but this error is occuring after every 5 minutes after we converted to 2.0

    Could you please help we have a release this weekend..and Thank you in advance.

  53. Praveen says:

    Is this site still active or nobody cares??

  54. David.Wang says:

    Praveen – I believe SAS Support needs to look at this issue because you are seeing memory corruption, which causes the crash.

    Now, the memory corruption happens while the SAS code tries to allocate memory. Since it is *highly* unlikely that another piece of code corrupts the exact piece of memory that SAS is about to use, it is *highly* likely that SAS code itself is using bad memory.

    This means that SAS support is responsible because it is a logical bug in their code. If you cannot get them to diagnose/support their own code, then you need to rethink using their software.

    //David

  55. David.Wang says:

    Praveen – Memory corruption like this can be diagnosed with PageHeap from Application Verifier, supplied by Microsoft.

    http://www.microsoft.com/downloads/details.aspx?familyid=bd02c19c-1250-433c-8c1b-2619bd93b3a2&displaylang=en

    You can use it to try to track down *when* something corrupts a piece of memory in the SASComb heap because that’s the culprit that needs to be fixed.

    //David

  56. I am having problems with w3wp.exe on a dedicated Win 2003 server – it randomlt shoots up to 99% CPU usage… and, yes, it does have all the latest patches and I can find no way to resolve the issue – restarting IIS sometime ‘cures’ the problems and on other occasios, it doesn’t.

    Our site is running good old ASP (not .NET) connecting with one big MS Access DB and several other smaller DBs.  I have checked all open connections and all my code checks out – there are no w3wp.exe errors in the App. log but just occasional errors from W3Ctrs "taking too long to refresh the W£SVC counters."

    Any thoughts?

    Thanks in anticipation

    //Andy B

  57. Praveen says:

    Thanks David for your reply,

    We’ll try to contact SAS again, with your reply,

    but I have a question.

    Actually we have a wrapper class to connect to SAS, before we migrated to 2.0 framework, there used to be lot of connections we were making using this class and then closing the connection as soon as we are getting the data…

    that is connect..get the data and close the connection..

    Now this was very time consuming, and as part of performance enhancements we made, removed all the connections..restored only one connection through out the session and reusing it.

    Then we started getting the above mentioned application pool crash..

    so now we have a situation here..so is it because..we are using (re-use) the connection through out the session (which sas com bridge dlls might not support) or is it because we converted the code to 2.0..or is it becuase of the combination..

    Then as the first step..we put back all the connections..that is no reuse…open get the data and close..each and every time almost..

    in .net framework 2.0 code…

    still we saw the error (and the app pool crashed)..

    then we thought..it’s not because..of the connections…

    Then we converted the code back to 1.1 framework..with all the connections (that is no re-use)..

    …then..then..then..

    it looks like..that the error started disappearing now…but still testing..to see..if the app pool crashes..but mostly..it’s not crashing..

    so it looks like the combination of 1.1 framework and putting back the connections..that worked…but I don’t know why it is so..

    David..could you please share your thoughts on this??…because…we need to convert to 2.0 and we need better performance..both (or either) of them..are not possible..with the current scenario.

    Thanks

    Praveen.

  58. David.Wang says:

    Andy B – clearly the problem is not with IIS code, no matter how your code checks out, because IIS does not have sporadic performance issues.

    Perhaps your ASP code has problems with MTA (default change from prior IIS versions) and needs to switch to STA. Several other people resolved their CPU spikes with this change.

    Another culprit is your Access DB because MDBs are notoriously single-threaded and easily cause contention like the 100% CPU spike.

    In other words, I believe these CPU spikes are caused by problems/assumptions within user code, so the problem is not with w3wp.exe but rather user code no matter the process it is in…

    //David

  59. David.Wang says:

    Praveen – the only way to understand "why" it works is to understand why it fails.

    Personally, I do not think it is a problem with .Net Framework. Maybe .Net Framework 1.1 memory layout masked a bug in the SAS DLL so that it is always exposed in .Net 2.0.

    Clearly, that would not be a problem with .Net Framework 2.0, but from your perspective the SAS DLL is not usable under .Net Framework 2.0, thus it seems like an issue with .Net Framework 2.0.

    And the only way to distinguish between the possibilities is to diagnose the crash.

    From my perspective, the crash is caused by SAS DLL, and maybe .Net Framework 1.1 is your only work-around. I would not expend effort to try to figure out how to get a work-around on .Net Framework 2.0 because that does not resolve the actual issue.

    //David

  60. srinivas says:

    we are experiencing issue with app pool on one of the serevr.

    Event viewer shows the following errors for system log:

    1.A process serving application pool ‘RTIAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘5668’. The data field contains the error number.  

    we are getting the same error for different process ids.. then we are getting the following error:

    "Application pool ‘RTIAppPool’ is being automatically disabled due to a series of failures in the process (es) serving that application pool."

    and app log shows the following error:

    Faulting application w3wp.exe, version 6.0.3790.3959, stamp 45d6968e, faulting module kernel32.dll, version 5.2.3790.4062, stamp 46264680, debug? 0, fault address 0x0000bee7.

  61. David.Wang says:

    srinivas – you are running an unstable application in IIS that is repeatedly crashing and has been disabled.

    You should follow the instructions in this blog entry to use the correct tools and methodology to detect and identify the issue to get it fixed.

    This is almost certainly a bug in your code or a 3rd party component used by the application serviced by IIS.

    //David

  62. Roman says:

    Hi David,

    How to debug "w3wp", if "application pool crash" occurs at moment of start "w3wp"?

    I was trying to add key in registry

    [HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindows NTCurrentVersionImage File Execution OptionsW3WP.EXE]

    "Debugger"="C:\Program Files\Debugging Tools for Windows\ntsd.exe -server npipe:pipe=w3wp"

    but i can’t connect to debugger,

    thus following messages in "system" event log (order by date and time):

    "A process serving application pool ‘MyPool’ terminated unexpectedly. The process id was ‘7016’. The process exit code was ‘0x80004005’."

    "Application pool ‘AppPoolASPNet11’ is being automatically disabled due to a series of failures in the process(es) serving that application pool."

    Thanks…

  63. Roman says:

    correction: the name of application pool is ‘AppPoolASPNet11’ everywhere.

    Sorry…

  64. David.Wang says:

    Roman – how did you verify that your settings work?

    Did you try it on something else like "notepad.exe" instead of "w3wp.exe"?

    I am not certain why you need to do this because no application crash happens at the start of w3wp.exe. One can always intervene and attach a debugger prior to any application code execution on IIS.

    However, in all cases correct debugger settings are critical, so I suggest establishing that first. You can probably progress to the solution on your own so I would like to give you the satisfaction of doing so, given only hints.

    //David

  65. Brian says:

    It seems a shame that to troublshoot IIS errors like this, you have to install a tool to gather more information, hoping that the crash happens again. PLus, some of these tools can create disk quota issues with large dumps. It would be great if more information could be gathered at actual error time.

  66. David.Wang says:

    Brian – Sure, it is possible to gather this information at actual error time using built-in tools to Windows — that is exactly how we troubleshoot IIS and Windows software in general, and it works great.

    However,  you would have to endure a 20+% hit in performance. Is that acceptable to you?

    The existing tools have low system impact and work for the vast majority of users because that’s what they want.

    Now, disk quota issues sounds like your problem, not a problem with the tools. You certainly cannot suggest that debugging bypasses quotas… because that would be a security vulnerability.

    //David

  67. Nibor says:

    Hi,

    I’m running into a weird problem when I try to run IISState.

    Some background information first:

    A customer has a .NET application connecting to webservices on an IIS 6 webserver.  The last few weeks the application randomly goes very, very slow, and we can see w3wp.exe running on around 30-40% cpu usage. After a while (30-40 mins) cpu usage goes down, and the application is "fast" again.

    However, the last few days the webserver has gone crazy, and cpu usage has been much higher, up to 100% during the night when no one connects to the application at all, and when they do connect, cpu usage actaully went down to 80-85%. And all of a sudden it drops tp 0% before going all the way up again.

    Running an iisreset solved the problem for a while, before it re-occured again. The interesting thing is that after a full restart of the server, the users of the application was all happy, whilst I could see the w3wp.exe still running on 99% cpu usage.

    So, I thought I should try out IISState to see what is hogging the cpu. When I ran it, it got to thread 12 or 13, before it stopped, hung, and eventualy restarted w3wp.exe (to my amusement this actually "solved" the problem on the server, since cpu load has been on 1-2% since then).

    What is logged is the usual "A process serving application pool ‘DefaultAppPool’ failed to respond to a ping. The process id was ‘3356’."

    I’ve tried to download both IISState 3.0 and 3.3.1, and run into the same problem with both.

    What the IISState log says is:

    Thread ID: 11

    System Thread ID: ba0

    Kernel Time: 0:0:0.0

    User Time: 0:0:0.0

    Thread Type: Managed Thread. Possible ASP.Net page or other .Net worker

    SaveModuleToMemory failed

    Failed to load SOS data.

    No valid SOS data table found.

    Begin System Thread Information

    # ChildEBP RetAddr  

    00 019dfea8 00000000 ntdll!KiFastSystemCallRet

    GetContextState failed, 0x8007001F

    GetContextState failed, 0x8007001F

    GetContextState failed, 0x8007001F

    Thread ID: 12

    System Thread ID: 0

    Kernel Time: 0:0:0.0

    User Time: 0:0:0.0

    GetContextState failed, 0x8007001F

    I’m not really sure what to think, since we never had any actual crashes or timeouts on the server until I ran IISState, so it seems that somehow IISState initiates it.

    Any ideas?

    -Nibor

  68. David.Wang says:

    Nibor – There is no issue with IIS State here.

    What is happening is that you used IIS State to break into a live w3wp.exe to take a trace. Meanwhile, IIS is still configured to periodically "ping" the w3wp.exe of its Application Pools. Since IIS State is hogging this w3wp.exe trying to take a trace, the w3wp.exe cannot respond to the ping, thus IIS deemed the w3wp.exe as hung and initiated a recycle of the w3wp.exe as well as log the event log entry.

    My guess is that the .NET application is badly written and causes a lot of Garbage Collection, which would cause CPU to spike sporadically for long periods of time. Most people schedule periodic restarts of the .NET Application pools to work around their performance problem, if their design/environment allows it.

    //David

  69. Johnathan says:

    It is good to see you are still following this post.  I found it a good article.  However, I would hope that these tools would come with IIS out of box along with best practices documentation.  

    I am seeing w3wp.exe crashes under load tests and I am starting to debug them. Is there a ‘common mistakes’ article out there to help with common web services?

    Since Server 2008 is about out I wanted to know if this process gets easier with the next version of IIS.  

    Thanks.

  70. David.Wang says:

    Jonathan – I believe that tools and best practices documentation are living documents that should not be tied to be "out of the box".

    Frankly, people use web servers in such diverse ways and environments that any tool or comprehensive documentation quickly goes out of date or will be so large that it is not worthwhile/efficient to maintain.

    Most of the tools and techniques are general problem-solving techniques that apply to everything, not just IIS. I highly recommend that you learn the techniques because it helps out everywhere.

    By learning how to troubleshoot, not just pattern match against "common mistakes" articles or "best practices" documentation, you will be able to help yourself, and not constantly rely on someone else to provide you the information.

    Personally, I think the process gets both easier and harder with the next version of IIS in Windows Server 2008, IIS 7. On the one hand, you have Failed Request Tracing, which combined with IIS logs, Event Logs, and HTTP.SYS logs, is all one needs to troubleshoot most issues on IIS7 without needing a debugger or turn on ETW tracing. On the other hand, IIS7 is so generic and easy to misconfigure that you have tons more ways to accidentally shoot yourself.

    I suspect that most people will initially find IIS 7 harder to use/configure because it will be easy to break and misconfigure, but after people get comfortable with IIS7 basics, they will learn to appreciate the improvements and benefits and see how is gets easier.

    //David

  71. Allen says:

    Hi David, and thanx for posting such a wonderful article.

    Well, my problem is also somewhat similar but couldn’t understand or diagnose the problem as such because the site is hosted by discountasp.net. They don’t even give me event logs, memory dumps as i asked for it multiple times, reason being told as it is a shared hosting.

    Your response would be much appreciated.

    Thanks.

  72. David.Wang says:

    Allen – unfortunately, without cooperation from the hoster, it is impossible to troubleshoot.

    For all you know, your problems may be caused by another customer since it is shared hosting. But, you should not be able to debug someone else’s application to figure it out (security issue), so debugging on a shared hosting environment by the customer for systemic issues is practically impossible.

    //David

  73. Carl Cook says:

    Hi David,

    Thanks for your replies to this post… you have given a great starting point to help me fix a problem I have been recently assigned.

    I am troubleshooting some WCF services run within IIS 6 on a Windows 2003 R2 server. We are getting the app pool crashing out, with the ‘typical’ messages in the event log ("faulting application w3wp in mscorwks.dll", "communications failure", "process unexpectedly terminated", etc). I ran IISState/WinDbg to produce a stack trace, but this gave me nothing out of the ordinary (all the stacks looked fairly normal, no stack overflows, access exceptions, deadlocks, etc). .Net logging/tracing shows nothing out of the ordinary. Performance monitoring showed thread usage was fine, no memory leaks, etc. Http logs gave nothing out of the ordinary either.

    Today, I ended up stepping through one of the WCF services with a debugger (attaching from my dev machine to the live server as a remote process), and the debugger returned a System.ExecutionEngineException from the DataContractSerializer.WriteObject method. This exception is uncatchable… all I can do is continue execution in the debugger… at which point the w2wp.exe process gets terminated. [According to a few recent posts, this might be a CLR issue (i.e. the dynamically generated serializer is not handling nullable fields correctly, then crashing out).]

    So, one lesson I have learnt is that stepping through the worker process with a debugger is one of the easiest ways that I have found to find the real fault within IIS.

    However, what you would you now recommend as an approach to fix this problem? Approach Microsoft for a potential hotfix? Code around the problematic method? Install .Net 3.5 (on the hope that the problem has been addressed)? Try to disassemble the System.* libraries to find the root cause? Move to Windows 2008 Server with either IIS 7 or WAS, and just hope that this fixes the issue? Or something else?

    Many thanks!

    -Carl.

  74. Kalpesh Joshi says:

    I have Win 2008 64bit with iis7.  I have developed a 32bit application in .net.  I have create an additional apppool named online.  

    I have select "Enable 32bit Application" as TRUE.  

    Now whenever i brows my application this apppool has been stopped with searies of failure.  

    If I set "Enable 32bit Application" as FALSE then i get JET 4.0 Errors.

    The event log contains this data

    A listener channel for protocol ‘http’ in worker process ‘5780’ serving application pool ‘OnlinePool’ reported a listener channel failure.  The data field contains the error number.

    Event ID 5139 and then stope the apppool with event id 5002.

  75. David.Wang says:

    Carl – requesting for a hotfix is the best approach if you do not want to change .Net version. And then temporarily code around it.

    I would not upgrade .Net version nor OS on the "hope" that it contains the fixes — if it is fixed in the later version, a hotfix usually exists for the prior version. However, if you do not report it, it has no chance of getting fixed.

    //David

  76. Harry McHaffie says:

    David,

    Please Help!  We opened a case with Microsoft in an effort to resolve our crash problem.  WinDbg was used to trace the crash.  The specific .dll and function that was causing a stack overflow was discovered.  Microsoft asked for the debug version of that .dll.  We went to the company that supplies a C API for their product that contains the problem .dll.  This company will not provide the debug version of the .dll.  Are we at a dead end?

    We have Server 2003sp2, SQL Server 2005sp2, ESRI SDE 9.1sp2, VS2008 with a Solution containing a C++ project that makes calls to the "SDE C API" (SDE C API contains the problem DLL), a CSharp Wrapper project that wraps the C++ code and a VB Web Service project that calls the CSharp Wrapper code.

    Running the Web Service from within VS2008 everything works great!  We call a method which returns the expected data.  

    We publish the web service to the same machine VS2008 runs the solution successfully on.  Run IE, pull up the page to test the Web Service, select the method and provide one parameter, then Invoke the method.  It throws "An unhandled win32 exception occured in W3WP.exe".  We know from the debug/trace that the offending function is in the sdesqlsrvr91.dll, the name of the function that causes a Stack Overflow is SE_LowerCase.  Our code makes no direct calls to this function.  The SDE C API functions that we are making calls to within our C++ code are probably making a call to the SE_LowerCase function.

    Our event log reflects the following:

    A process serving application pool ‘DefaultAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘3212’. The data field contains the error number.

    A process serving application pool ‘DefaultAppPool’ terminated unexpectedly. The process id was ‘3440’. The process exit code was ‘0x0’.

    Is there anything else we can do?      

  77. David.Wang says:

    Harry – unfortunately, your issue is with the 3rd party software. There is little Microsoft or anyone else can do about it.

    It sounds like that software does not work on Windows Server 2003 and IIS6 because applications are launched in different processes with different permissions and filepaths for debug by VS2008 and for-production by IIS6.

    You will need to verify that this 3rd party software is supported on Windows Server 2003 and if so, what are its required configuration. For example, this 3rd party product could have code searching up its stack for the process that launched it, and it is confused by finding w3wp.exe or "\?" in the filepath.

    In short, these would all be bugs in the 3rd party software if they say they support Windows Server 2003.

    //David

  78. Dave says:

    Hi David,

    This is a great article that really helped me understand how to go about diagnosing an application crash.

    I am experiencing an issue where DEP stops one of my worker processes.  As a result, our site goes down and our web app along with it.  I have this generic entry in the application event log:

    Faulting application w3wp.exe, version 6.0.3790.1830, faulting module unknown, version 0.0.0.0, fault address 0x00ef0184.

    I have downloaded and installed Debug Diagnostics on the server this issue occurs on and setup a crash rule to perform a full dump when the next crash occurs.  My problem is, when I check on the utility the next day, the rule says "complete" and is no longer actively monitoring.  I usually have to keep re-activating the rule and checking on it to make sure it is still active.  The next time the crash occurred, I checked for a memory dump but the rule was "complete" and not active.  Therefore, I did not have a dump to analyze.  I do not have extensive knowledge of capturing application crashes, but I do want to learn how to so that I can try to fix this crash.

    In the advanced settings section of Debug Diagnostics, I went to the exceptions area and raised the action limit for some action types I added.  This was in hopes of being able to keep the crash rule active and capture the crash.  However, now I see these errors in my application event log:

    Event Type: Error

    Event Source: W3CTRS

    Event Category: None

    Event ID: 2003

    Date:  7/9/2008 Time:  1:01:17 PM

    User:  N/A

    Computer: CAICORP-H01-16

    Description: It has taken too long to refresh the W3SVC counters, the stale counters are being used instead.

    If you can help me understand how to keep a crash rule active in Debug Diagnostics, and also if the above error is something I should worry about, I would greatly appreciate it.  I really want to figure out what is causing this application to crash with Debug Diagnostics, but I just need some help with the advanced features.

    Thanks!

  79. Matthew Evans says:

    Thanks for the article David.

    We are experiencing exactly the (variety) of symptoms described. The system and event log contain a subset of the w3wp, App Pool communication errors (in various combinations) – as you describe.

    I suspect some kind of subtle recursion in our latest build.

    However the site is essentially unusable, we get this little beauty every 2-3 minutes:

    The description for Event ID ( 5000 ) in Source ( .NET Runtime 2.0 Error Reporting ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: clr20r3, w3wp.exe, 6.0.3790.3959, 45d691cc, mscorlib, 2.0.0.0, 461f060e, 40cf, 2d, 5s11pi0dzbt3ygdcdkat1jwohxctyitc, NIL.

    Followed by a period where IIS doesn’t respond to requests, after which the worker process appears to crash / recycle.

    I’ve rolled back the build as a result.

    My question is: since this is only occuring live, with realistic load, how can I obtain a debug dump outside the production environment ?

    Any ideas appreciated .

  80. Please Sir, The URL supplied is of the particular app im complaining about. Its most likely you will get the <h1>Server Unavailable</h1> error when u view it. I need help i dont know how to use the described debugger, besides My App runs with asp.net 2.0 so does this apply to me? can i resolve to isolation mode? The App has been running for 6mths prior to  this problem!. Pls Any Advice will be appreciated…

    Pls Kindly mail me ur reply or send me back a link to this site cos im consulting many sites for help… idmode07@gmail.com

  81. Thanks for the article David.

    We are experiencing exactly the (variety) of symptoms described. The system and event log contain a subset of the w3wp, App Pool communication errors (in various combinations) – as you describe.

    I suspect some kind of subtle recursion in our latest build.

    However the site is essentially unusable, we get this little beauty every 2-3 minutes:

    The description for Event ID ( 5000 ) in Source ( .NET Runtime 2.0 Error Reporting ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: clr20r3, w3wp.exe, 6.0.3790.3959, 45d691cc, mscorlib, 2.0.0.0, 461f060e, 40cf, 2d, 5s11pi0dzbt3ygdcdkat1jwohxctyitc, NIL.

    Followed by a period where IIS doesn’t respond to requests, after which the worker process appears to crash / recycle.

    I’ve rolled back the build as a result.

    My question is: since this is only occuring live, with realistic load, how can I obtain a debug dump outside the production environment ?

    Any ideas appreciated .

  82. David.Wang says:

    Matthew – if you want to obtain a debug dump outside of production, then you will have to find a way to make it reproduce outside of production… otherwise, you will have to risk taking the debug dump from the production environment to get real information.

    Now, I don’t have a bullet-proof decision tree for you to follow. If that was possible, then it would be automatable and we wouldn’t be having this discussion. What I would suggest is that you need to determine if the issue requires realistic load (i.e. repetition), a certain timing-oriented bug, or is it some logical resource leak which load can easily trip over.

    Frankly, without the memory dump, one can only guess at the cause and then try to gather data to prove/disprove the hypothesis.

    For example, if you think it is recursion, then you are looking for the culprit request out of all possible requests made in production. You won’t need load to reproduce a recursion issue.

    If the issue has to do with subtle timing issues, then it can be related to the request load quantity, sequence, etc.

    If the issue has to do with resource leakage, then with a request mix you should be able to reproduce it outside of production.

    //David

  83. Ian Staines says:

    David,

    I am confused about the "pageheap" option of DebugDiag.  The help dialogs for DebugDiag suggest that this is a stand alone feature of DebugDiag, but when I try to enable the option I cannot figure out how to catpure exceptions generated by heap problems.  

    I note that most discusion on heap corruption suggest that you use "Application Verifier" in conjuction with DebugDiag to capture heap problems.  If so what is the purpose of the "pageheap" option of DebugDiag?

  84. clinks63 says:

    Hi David,

    I have read the blog and some of the comments. Based on your article, it was telling that it crash might came from user code. I was just wondering why this crash happens, for I’m running a code same code for 2-3 times but in the 4th-5th time..the page is reflecting that the session has expired..and in the event viewer the log is displaying "www crashed"..we also did some investigation, and what happens is that at the code, it stops at a random line of code..

  85. David.Wang says:

    clinks63 – Crashes do not happen on random lines of code, unless the bug is something else running inside the same process that is corrupting random memory location (and hence random lines of code seems to crash).

    The most likely cause of random memory corruption? More User code. Why? Well, if IIS itself was randomly corrupting memory, that same bug would affect thousands of servers world-wide and be rapidly detected, reported, and fixed.

    //David

  86. If anyone has been receiving this error with Cognos 8 application server, Cognos has provided a patch (hot site):

    # 604680 Available report processes flipping between peak number and non peak number every 20 minutes

    The detailed document for this hotfix also included a fix for a memory leak happening with AD authentication. They have said this should fix our particular issue and it appears to be resolved as we have applied this hotsite in addition to upgrading to Cognos 8.3 SP2.

    Brian

  87. David.Wang says:

    Brian – Thanks for the notice. However, your patch only fixes one specific issue for one specific product. It is possible for another product to crash, and it is possible for the same product to crash in a different way, and while both would look the same from the Event Log, neither would be fixed by the patch you mention.

    Thus, I would not recommend telling people to go get patches when they see crashes. I recommend following this blog entry’s advice to first diagnose the crash using automated tools, then query the support personel of the responsible module for a resolution to the diagnosed crash.

    //David

  88. David says:

    IIS Pools tool from http://www.hoststools.com/ is helpful when investigating app. pools crashes and hangs.

  89. Wing-Leung says:

    I have been troubleshooting the "The identity of application pool, ‘___’ is invalid", event ID 1021, data field 0x8007052e for a couple of hours. I also saw the security failure audit, event ID 529, reason "Unknown user name or bad password". So, this led me to change the password of the application pool identity. I also changed the Windows user’s password to match (We use a local, restricted, account for the application pool). I also stopped and restarted the application pool. However, this did not resolve the issue. However, I eventually figured out that I had to do a full IIS restart and that did the trick.

    What I don’t understand is why I need to do the IIS restart. When I restarted the application pool, I saw the w3wp.exe process stop so I figured a new w3wp.exe process should pick up the new credentials. For some reason it appears that IIS keeps some kind of credentials cached???

    I even tested this further by getting my site working (ie: password is now correct). Then I deliberately changed the user’s password to something else. However, my site still loaded and application pool was still okay even after a stop and restart of the application pool. It is not until after I did a full IIS restart that I got the identity is invalid error as expected.

    I am missing something here regarding user accounts and password changes. Can one of the gurus here explain? Also, is there a way to do this without a full IIS restart (since it would cause other sites to be temporarily unavailable).

    Thank you

  90. ArntK says:

    Hi David,

    We have an ongoing case with Microsoft support about this issue. We have had the System events 1011/1009  several times through the last 6 weeeks. MS has not been able to get any dump files, how is this possible? Please share any insight you have about this.

    Thank you.

  91. David.Wang says:

    Wing-Leung – Full IIS restart is required because WAS has the application pool identities "cached" for use to [re]start an Application Pool.

    There is no change-notification within Windows for IIS to know whenever a user property has changed, and it is undesirable for IIS to constantly re-login the user account to start up an Application Pool (imagine if the Application Pool Identity is a domain user — this behavior would potentially allow anonymous user to use IIS to DoS the Domain Controller under certain user configurations).

    Thus, the application pool identity is cached, and it can cause issue precisely when you alter user properties. However, that is the exceptional case, not the common case – user password probably changes every 90 days to never, while recycling application pool likely happens many times during 90 days. Given the lack of change-notification from Windows for IIS to know whenever a user property has changed, the ocassional issue with exceptional case is worth the security and performance benefit of the common case.

    //David

  92. Pratik says:

    I got error in our asp.net application please give me a solution.

    EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.3959, P3 45d6968e, P4 mscorlib, P5 2.0.0.0, P6 471ebc5b, P7 11d9, P8 18, P9 system.outofmemoryexception, P10 NIL

  93. David.Wang says:

    Pratik – Your application is either using too much memory, or your web server has insufficient memory for your application.

    Determine how much memory your application needs. If you see the OutOfMemoryException without using that much memory, then the problem is with your ASP.Net application. Get a memory dump and walk through it using SOS.

    Basically, the problem is caused by your application, and you have to fix it.

    //David

  94. Henri Visser says:

    I’m getting the same error in an application I wrote: Faulting application w3wp.exe, version 6.0.3790.3959, stamp 45d6968e, faulting module kernel32.dll, version 5.2.3790.4062, stamp 46264680, debug? 0, fault address 0x0000bee7.

    Only on Windows 2003, not XP or 2000. It happens when using the transform method to transform a XML document to an HTML document:

    Dim xmlout As New StringWriter()

    Dim transform As New Xsl.XslCompiledTransform                transform.Load(XSLTFileName)

    transform.Transform(doc, args, xmlout)

    Ive tried running in different app_pools, admin user account, but still no luck

    Any Ideas?

  95. Henri Visser says:

    Found the issue after using debugdiag and realizing its a memory leak:

    http://social.msdn.microsoft.com/Forums/en-US/xmlandnetfx/thread/7fa6bbc4-c115-454f-af8e-0740b09d3b3a/

  96. This is a great post, but I’m having trouble figuring out how to debug IIS 7.0.  Many tools have been recommended for prior versions of IIS on google, but IIS 7.0 doesn’t seem to be well described.  Any recommendation on the best way to debug a crashing IIS 7.0 application pool?

  97. ekreger says:

    Hello. I have run debug diagnostics and am a little confused by its output.

    [2/25/2009 2:02:23 PM] Thread created. New thread system id – 4728

    [2/25/2009 2:02:23 PM] Thread created. New thread system id – 4016

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    [2/25/2009 2:02:23 PM] First chance exception – 0xc0000005 caused by thread with system id 2448

    This is just a sample but from what I understand, First Chance Excpetions are not necessarily a problem.  However…. Should I be seeing this many so often on a site that is not being used hardly at all?

  98. Mauro says:

    Hi David,

    in my application I receive sometimes this error.

    Could you please have a look on it?

    Faulting application w3wp.exe, version 6.0.3790.3959, stamp 45d6968e, faulting module kernel32.dll, version 5.2.3790.4062, stamp 462646fb, debug? 0, fault address 0x0000bee7.

  99. David.Wang says:

    Richard Mathis – the same approach works for IIS7. In fact, the same troubleshooting approach works for all versions of IIS for the past 15 years.

    IIS7 just comes with new features like Failed Request Tracing which allows you to troubleshoot non-crashes, but for crashes/hangs, you still apply the same described approach because it is how you debug/troubleshoot issues in general on Windows.

    //David

  100. David.Wang says:

    Mauro – You will need to use a tool like IIS State or DebugDiag to capture a stack backtrace of the crash and diagnose your issue.

    It is unlikely for w3wp.exe (IIS code) or kernel32.dll (Windows code) to be the issue causing the crash, so you need to use the stack backtrace to see what other code is using the IIS/Windows code to better determine the real issue.

    //David

  101. David.Wang says:

    ekreger – It is not possible to determine correctness of exceptions without knowing the code that is being executed.

    For example, First Chance Exception can result from managed Exceptions being thrown and caught. Which may be expected. Or not expected. Totally depends on the application and its architecture.

    //David

  102. Marc Novak says:

    Excellent article David.

    IIS State helped me track down (precisely) what was causing our SMTP / Inetinfo service to fail.

    In my case an application we had been using for many years called Policy Patrol (which hooks into the SMTP service) had issues. As detailed below. We have now removed Policy Patrol (with a view to re-installing at some point in the future)

    One thing worth noting is. When you terminate IIS State it also terminates the InetInfo.exe it’s monitoring. On that basis IIS State should only be used with that in mind.

    Thanks

    Marc

    IIS State output……….

    Opened log file ‘D:iisstateoutputIISState-5628.log’

    ***********************

    Starting new log output

    IISState version 3.3.1

    Wed Apr 15 10:34:41 2009

    OS = Windows 2003 Server

    Executable: inetinfo.exe

    PID = 5628

    Note: Thread times are formatted as HH:MM:SS.ms

    ***********************

    IIS has crashed…

    Beginning Analysis

    DLL (!FunctionName) that failed: ntdll!RtlAllocateHeap

    Thread ID: 21

    System Thread ID: 1b88

    Kernel Time: 0:0:0.78

    User Time: 0:0:0.62

    *** WARNING: Unable to verify checksum for D:Program FilesRed Earth SoftwarePolicy Patrol Email 4MailProcessorPP4_SMTPSink.DLL

    *** ERROR: Symbol file could not be found. Defaulted to export symbols for D:Program FilesRed Earth SoftwarePolicy Patrol Email 4MailProcessorPP4_SMTPSink.DLL –

  103. David.Wang says:

    Marc – failures during memory allocation (like ntdll!RtlAllocateHeap) indicate something in the process is corrupting memory that it does not own.

    It *may* be possible that PP4_SMTPSink.dll is a victim, but if you remove it and the crashes go away, it is a good bet that it was the one corrupting memory, in which case you will not be able to re-install it in the future without getting a new build which fixes the memory corruption.

    To help detect memory corruption, you want to use Application Verifier with full pageheap checking enabled.

    http://www.microsoft.com/downloads/details.aspx?FamilyID=C4A25AB9-649D-4A1B-B4A7-C9D8B095DF18&displaylang=en

    Basically, heap checks work by allocating memory on page-boundaries, so if you overstep your memory boundary, you end up touching unallocated memory and Windows immediately halts the process with an access violation — instead of the usual packed memory allocation scheme where if you overstep your memory boundary, you end up stomping on some other memory allocation nearby and corrupt that memory.

    Of course, such memory checking technique impacts available memory (allocating on page boundaries wastes lots of space in a memory page) and also hooks Application Verifier into that process for all memory allocation/deallocation, but it is also the surest and easiest way to track down memory corruption.

    //David

  104. Rakesh Jain says:

    Hi David

    Very useful article, i appreciate your in depth knowledge on the subject.

    I have a similar problem, My application get crashed and when i see the event log i see a warning message .

    "W3SVC – A process serving application pool ‘DefaultAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘5236’. The data field contains the error number. "

    and my httperr.log says

    2009-05-15 09:38:13 192.168.15.104 1339 192.168.1.49 80 HTTP/1.1 GET /MYCUBE-/Web/COOL/ENG/Online/FCISVSubscription.aspx?TXNNO=0220091350001209&MODE=VIEW&AUTHSTAT=U&LimitOrder=&LimitOrderType=&hdnTxnModified=0&MODID=OCMCFOCPT&AUTH=Y&rnd=0.5925906497712439&TITLE=Subscription – 1 Connection_Abandoned_By_AppPool DefaultAppPool

    I realised from your articles that i could the problem with my application and i tried to trace the error using the Debug Diagnostic Tool v1.1.

    I has generated the dump files on crash on when i did the analysis , i found nothings, it reads " there is no problem detected in dump"

    I am really not sure how to approach this problem further, as log did not detect any problem but my application is getting crashed everyday.

    Application : its built on asp.net 1.1 platform (it has got migrated asp code also)

    db :oracle

    sever: windows 2003 sp 1

    Please help me out.

  105. Rakesh Jain says:

    Hi David

    Debug Diagnostic Tool v1.1 tools seems to found an answer, it has generated the dmp and when i did the analysis, it shows the error as

    In w3wp__PID__3112__Date__05_20_2009__Time_10_12_29AM__750__First chance exception 0XC0000005.dmp the assembly instruction at kernel32!RaiseException+53 in C:WINDOWSsystem32kernel32.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to read from memory location 0x00000000 on thread 15

    Not sure how to approach this problem.

    Any suggestion

  106. sohbeti says:

    kameralı sesli sohbet siteleri

  107. KrisRaam says:

    Hi David, I have read your post which is simple and good.

    I have a persisting problem on the IIS 6.0 crash. I found out i web service is causing more CPU usuage. But the same copy of the webservice in another W2k3 server where there’s not any problem.

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID: 1011

    Date: 11/09/2009

    Time: 13:28:54

    User: N/A

    Computer: KRIS

    Description:

    A process serving application pool ‘NewAppPool’ suffered a fatal communication error with the World Wide Web Publishing Service. The process id was ‘11280’. The data field contains the error number.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Pls help me out!!!

  108. Vikram says:

    Hi David,

    my web application (developed in ASP.NET 3.5) is not running on IIS 6.0 (OS: Windows Server 2003 SP2) it is displaying "Page Cannot be displayed"  & when i checked the httperr.log file i found "Connection_Abandoned_By_AppPool DefaultAppPool".The worker process is getting crashed…

    Everything was working fine until i installed some windows security updates.I guess the problem is with the updates.

    But how should i fix it to work?

  109. tekzalim says:

    Turkiyenin en kaliteli sesli chat sitesi.

  110. tekzalim says:

    Turkiyenin en kaliteli kameralı chat sitesi.

  111. tekzalim says:

    Turkiyenin en kaliteli motosiklet sitesi.

  112. tekzalim says:

    Turkiyenin en kaliteli ruya tabir sitesi.

  113. SachinC says:

    Hello i am getting following error

    always gives me the popup of IIS Worker Process Failed.

    how should i fix it??

    please help

    Thanks in advance

    Event Type: Error

    Event Source: .NET Runtime 2.0 Error Reporting

    Event Category: None

    Event ID: 1000

    Date: 1/4/2010

    Time: 4:07:12 AM

    User: N/A

    Computer: BESUTIL

    Description:

    Faulting application w3wp.exe, version 6.0.3790.3959, stamp 45d6968e, faulting module unknown, version 0.0.0.0, stamp 00000000, debug? 0, fault address 0x01dc5bb0.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 41 00 70 00 70 00 6c 00   A.p.p.l.

    0008: 69 00 63 00 61 00 74 00   i.c.a.t.

    0010: 69 00 6f 00 6e 00 20 00   i.o.n. .

    0018: 46 00 61 00 69 00 6c 00   F.a.i.l.

    0020: 75 00 72 00 65 00 20 00   u.r.e. .

    0028: 20 00 77 00 33 00 77 00    .w.3.w.

    0030: 70 00 2e 00 65 00 78 00   p…e.x.

    0038: 65 00 20 00 36 00 2e 00   e. .6…

    0040: 30 00 2e 00 33 00 37 00   0…3.7.

    0048: 39 00 30 00 2e 00 33 00   9.0…3.

    0050: 39 00 35 00 39 00 20 00   9.5.9. .

    0058: 34 00 35 00 64 00 36 00   4.5.d.6.

    0060: 39 00 36 00 38 00 65 00   9.6.8.e.

    0068: 20 00 69 00 6e 00 20 00    .i.n. .

    0070: 75 00 6e 00 6b 00 6e 00   u.n.k.n.

    0078: 6f 00 77 00 6e 00 20 00   o.w.n. .

    0080: 30 00 2e 00 30 00 2e 00   0…0…

    0088: 30 00 2e 00 30 00 20 00   0…0. .

    0090: 30 00 30 00 30 00 30 00   0.0.0.0.

    0098: 30 00 30 00 30 00 30 00   0.0.0.0.

    00a0: 20 00 66 00 44 00 65 00    .f.D.e.

    00a8: 62 00 75 00 67 00 20 00   b.u.g. .

    00b0: 30 00 20 00 61 00 74 00   0. .a.t.

    00b8: 20 00 6f 00 66 00 66 00    .o.f.f.

    00c0: 73 00 65 00 74 00 20 00   s.e.t. .

    00c8: 30 00 31 00 64 00 63 00   0.1.d.c.

    00d0: 35 00 62 00 62 00 30 00   5.b.b.0.

    00d8: 0d 00 0a 00               ….    

  114. robert says:

    Any thoughts on this one?

    Event Type:        Error

    Event Source:    COM+

    Event Category:                Unknown

    Event ID:              4786

    Date:                     2/10/2010

    Time:                     10:32:34 AM

    User:                     N/A

    Computer:          COSCSAPP2P

    Description:

    The system has called a custom component and that component has failed and generated an exception. This indicates a problem with the custom component. Notify the developer of this component that a failure has occurred and provide them with the information below.

    Component Prog ID: 6[ODBC][Env 1dcd34d8]

    Method Name: IDispenserDriver::DestroyResource

    Process Name: w3wp.exe

    Exception: C0000005

    Address: 0x1E20F548

    Call Stack:

    isqlt09a!_sqli_connect_close + 0x569

    ivif7912!InfConnection::allocStatement(class BaseStatement * *) + 0x266

    comsvcs!DispManGetContext + 0xcb6

    comsvcs!DispManGetContext + 0xf73

    comsvcs!DispManGetContext + 0x206a

    odbc32!SQLConnectA + 0x1fa0

    odbc32!SQLConnectA + 0x28d1

    odbc32!SQLSetEnvAttr + 0x23b5

    + 0x1b372416

    System.Data.ni! + 0x47a565

    mscorwks! + 0x1b4c

    mscorwks!DllUnregisterServerInternal + 0x619d

    mscorwks!CoUninitializeEE + 0x2ead

    mscorwks!CoUninitializeEE + 0x2ee0

    mscorwks!CoUninitializeEE + 0x35c8

    mscorwks!InstallCustomModule + 0x65ba

    mscorwks!InstallCustomModule + 0x65ee

    mscorwks!InstallCustomModule + 0x6484

    mscorwks!CorExitProcess + 0x301c3

    mscorlib.ni! + 0x225e4f

    mscorlib.ni! + 0x225d6b

    mscorwks!CorExitProcess + 0x1dec4

    mscorwks!CorExitProcess + 0x1df85

    mscorwks!CorExitProcess + 0x1de33

    mscorwks!StrongNameErrorInfo + 0x17b02

    mscorwks!StrongNameErrorInfo + 0x17a33

    mscorwks!CorExitProcess + 0x4b59

    mscorwks!CoUninitializeEE + 0x4e0b

    mscorwks!CoUninitializeEE + 0x4da7

    mscorwks!CoUninitializeEE + 0x4ccd

    mscorwks!CreateAssemblyNameObject + 0x23317

    mscorwks!CorExitProcess + 0x4a1d

    mscorwks!CorExitProcess + 0x4aa3

    mscorwks!CorExitProcess + 0x4a8f

    mscorwks!CorExitProcess + 0x4b0e

    mscorwks!StrongNameErrorInfo + 0x17a33

    mscorwks!StrongNameErrorInfo + 0x17d04

    mscorwks!CoUninitializeEE + 0x4e0b

    mscorwks!CoUninitializeEE + 0x4da7

    mscorwks!CoUninitializeEE + 0x4ccd

    mscorwks!GetPrivateContextsPerfCounters + 0xed35

    mscorwks!GetPrivateContextsPerfCounters + 0xed46

    mscorwks!_CorExeMain + 0x1374

    mscorwks!CorExitProcess + 0x21f3f

    kernel32!GetModuleHandleA + 0xdf

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    ——————————————————————————–

    Event Type:        Error

    Event Source:    Application Error

    Event Category:                (100)

    Event ID:              1000

    Date:                     2/10/2010

    Time:                     10:32:34 AM

    User:                     N/A

    Computer:          COSCSAPP2P

    Description:

    Faulting application w3wp.exe, version 6.0.3790.3959, faulting module isqlt09a.dll, version 0.0.0.0, fault address 0x0000f548.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 41 70 70 6c 69 63 61 74   Applicat

    0008: 69 6f 6e 20 46 61 69 6c   ion Fail

    0010: 75 72 65 20 20 77 33 77   ure  w3w

    0018: 70 2e 65 78 65 20 36 2e   p.exe 6.

    0020: 30 2e 33 37 39 30 2e 33   0.3790.3

    0028: 39 35 39 20 69 6e 20 69   959 in i

    0030: 73 71 6c 74 30 39 61 2e   sqlt09a.

    0038: 64 6c 6c 20 30 2e 30 2e   dll 0.0.

    0040: 30 2e 30 20 61 74 20 6f   0.0 at o

    0048: 66 66 73 65 74 20 30 30   ffset 00

    0050: 30 30 66 35 34 38         00f548  

  115. Me says:

    So really, what’s the fix??   🙂

  116. YY says:

    I agree this! The article is not right to the point, you are trying to by pass the issue and not knowing how to fix it, it's a wast of time reading his article!

    Here is a solution:

    http://www.windowsadminscripts.com/…/apppool

  117. Johann says:

    Just add a line of code to various modules that writes some diagnostic information to a log file such as the name of the current module and the values of some variables. Whenever a crash occurs, just inspect your log file to find out where the code last executed. Keep moving this log recording around within the module where the bug seems to appear until you get to the line where the crash is happening. If putting the recording code after the "buggy" line doesn't get recorded but does before it, then you have probably found where the problem is occurring.

  118. SVenk says:

    Hi David,

    I have read your post which is simple and good.

    I have a persisting problem on the IIS 6.0 crash.

    Following message captured in log

    Event Type: Warning

    Event Source: W3SVC

    Event Category: None

    Event ID: 1011

    Date: 12/12/2011

    Time: 13:28:54

    User: N/A

    Description:

    A process serving application pool 'PRD' suffered a fatal communication error with the World Wide Web Publishing Service. The process id was '11280'. The data field contains the error number.

    Error number: 8007006d .

    Server 2003 – SP1 and legacy dot net 1.1

    Pls help me out!!!

  119. W3SVC.EXE Hanging while installing website msi file in Win2003 says:

    Team,

    Please help me in below case, we have 2 test server, while installing website thru using msi file. after installing w3svc service will come to hung status after system restart w3svc service will work normally.  i'm not understanding the what exact problem.

  120. Manley de Kocks says:

    Appreciate the detailed explanation. Q: Can the diagnostics debug tool hinder that recycling of an application pool?

  121. Fred Johnson says:

    My app pool crashed when I changed an Html.ActionLink to an A tag with an Html.Action as the href.  WTF?  IIS is such a piece of garbage.  How is that even possible?