HOWTO: Microsoft.com OPS on Debugging IIS


If you are looking for information on how to troubleshoot a variety of IIS-related issues, the following week of Webcasts from Microsoft.com OPS team is not to be missed. Mark your calendars!


Of course, these sessions focus on issues AFTER you have determined that your problem is NOT with invalid configuration but with something running awry on the server. I know, I know, distinguishing misconfiguration from buggy code is not easy, but there are tools like DebugDiag and general troubleshooting steps/guidelines to help you. And if you need help with diagnosis, post the IIS State or DebugDiag log file with a clear subject line to the microsoft.public.inetserver.iis newsgroup on msnews.microsoft.com for assistance.


I consider Microsoft.com one of the closest and best customers of IIS… because they have an incredibly seasoned and talented OPS team (former IIS can’t hurt…), which allows them to be on the bleeding edge and be the first and largest consumer of beta IIS builds. As a result, they have unbelievably intimate access to the IIS product developers when it comes to debugging or feature suggestions, and they are privy to a variety of internal details and conversations you simply won’t find anywhere else.


Let’s just say that they definitely do their homework on issues and bring a wealth of experience to the table. I am hard pressed to name another group of people of comparable skill and experience when it comes to IIS and operations.


So… I suggest listening to what they have to say about debugging IIS. But, don’t just take my word for it; see it for yourself. 🙂


//David

Comments (4)

  1. Boro Marinkovich says:

    I read your recommendations on debugging IIS apps.  I am taking your advice but I’m having difficulty getting a stack trace dump.

    We have an asp.net web service application (3 actually) setup to run in seperate application pools.  We are finding that one of the applications hangs intermittently but does not crash.  During this time a netstat shows a large number of CLOSED_WAIT status for connections to the server.  To fix the problem, we have found that we need to delete and recreate the application pool as it seems to become corrupt.

    We are attempting to use the IIS Toolkit Debug Diagnostics tool to get a stack trace dump when the hang occurs.  However this tool appears to only create a dump when the application crashes, rather then hangs.

    Is there a way to force a stack trace dump so we can analyze where the application hangs to attempt to resolve the bug.

    Thanks.

    Boro

  2. David.Wang says:

    Boro – Can you define metrics to detect how your "application hangs".

    In general, code cannot detect hangs because a hang is simply an operation that never returns – so how do you distinguish between a long-running operation that eventually returns and one that doesn’t? This is a famous theorem in Computer Science – if you can solve it, you are famous. 🙂

    So instead, we have to be content with using other approximations to detect "hangs", and these are often different for each app. Thus, you don’t see DebugDiag offer an option to get a stack trace on a hange because it has no idea what metrics mean "hang" for any given application.

    Knowing that the issue of hang-detection is an unsolved problem – would you rather DebugDiag implement something that works for 50% of the cases but has no idea whether it works for your case. Would you think negatively of DebugDiag offering to take stack trace on "Hangs" but fails when you need it? Or would you rather DebugDiag not make false claims and not have "Hangs" as an option? Hopefully, this explains the dilemma we are in…

    Philosophical debate aside, let’s get back to the issue at hand… It is very important for you to troubleshoot what exactly is "hanging". Is the server not accepting requests? Is the requests accepted but taking long time to process? Or is it something else?

    While getting a stack trace of the "smoking gun" helps in resolving the issue, it is not always easy nor necessary to get that stack trace. Other troubleshooting techniques using log files, netstat, etc can narrow possibilities such that you can apply other logical constructs to solve the issue.

    In other words, to solve an issue, you need to first concretely define it ("hanging" is not concrete enough), and then go through various ways of "proving" the cause, either through direct proof (a stack trace), inductive proof (assume the reason and show it completes the proof), etc.

    For example, some insidious product issues take us months to track down, and we go through the whole problem-solving process – gather data from log files, instrumentations, etc, make and eliminate hypothesis, reformulate next steps to gather data, re-run scenarios to reproduce the issue, etc.

    We all wish there is a "Easy" button that just finds us the cause of an issue, but life is unfortunately not that easy. We just need to be persistent enough to troubleshoot and figure it out. 🙂

    Good Luck.

    //David