Normally I prefer for the server admin to manually trigger a hang dump (see http://blogs.msdn.com/b/chaun/archive/2013/11/15/several-good-ways-to-trigger-a-hang-dump-of-an-unresponsive-process.aspx) when the process is clearly unresponsive and end-users are complaining. That way we have human intelligence working to ensure that the hang dump is made while the process is obviously unresponsive. Timing is important. There’s no point in making a hang dump of a process that isn’t hung up.
But there are times when it’s just not feasible for the server admin to manually trigger a hang dump at a good time. For example, there may be times when the process becomes unresponsive in the middle of the night and there is no one around to report the problem to the help desk. Or maybe help desk just creates a ticket and the admin finds the ticket on the hang issue in the morning. Perhaps the process is still hanging around and still hung (unlikely but possible) or orphaned and the admin makes a memory dump of the process—but there’s just no activity in the dump. Sometimes it is good to trigger the hang dump soon after the hang occurs rather than hours after the fact.
One of cool things about Debug Diag 2.0 (http://debugdiag.com) is that there are no less than three ways to automate the triggering of a “hang dump” of a process. I don’t know of any other tool that can do anything quite like it.
I’m not trying to give an in-depth guide to the process here. Just an introduction.
The Perfmon Counter Option
You start with the performance rule. . .
You immediately have a choice between Perfmon counters or http response times.
If you’ve been troubleshooting a performance problem with performance monitor and know which Perfmon objects and counters to focus on already, that might be a really good way to go. If not, you’ll probably want to go with http response times.
I’m going to give a quick glimpse of the Perfmon counter option first.
Sorry to be Captain Obvious here, but you select the Add Perf Triggers button.
Of course you’re going to want to focus on the local computer. Don’t bother trying to monitor a different server because debugdiag is only going to trip a dump on the local server.
You’ve got many choices here.
You could focus on high cpu. (When CPU reaches a high level, trip a memory dump.)
You could focus on memory (when virtual bytes or private bytes reaches a high level, trip a dump).
You could do something like focus on 500 (internal server error) responses being sent, as seen below.
Or you could focus on ASP.net requests queuing, for example. When the queuing begins, trip a dump. Not a bad idea. But I’d only do it if you know for a fact that asp.net requests have been queuing when the hang (or slow performance) issue occurs. This assumes prior knowledge of the symptoms.
One last observation about this Perfmon Rule is that it can be used for any process. You don’t have to use it for an IIS process.
There are many options for brushing up on Perfmon to know what is normal and what is abnormal. Here are some references to consider.
Monitoring and maintaining SharePoint Server 2013
Monitor Activity on a Web Server (IIS 7)
ASP.NET Performance Monitoring, and When to Alert Administrators
Monitoring ASP.NET Performance
A second way to go…
… is to stick with http response times instead of Perfmon counters.
This method only applies to websites (or web applications) running in Microsoft IIS.
We start the rule the same way. . .
But this time we’re selecting the second option.
Click Add URL. . .
Now here you have a choice between ETW tracing or URL Pinging.
The ETW tracing option is new to DebugDiag 2.0. It wasn’t in DD 1.2.
If you go with ETW tracing, you do not have to specify a specific page to monitor. You can leave it blank to monitor everything that is sent to IIS, you can specify a specific website on that iis server (yes, this is only useful for IIS web servers), or a specific virtual directory.
Those are some cool options!
A third way…
Or you can instruct debugdiag to send some “pings” (not actually icmp pings) against a specific web page. Here you probably need to specify a page or else just use the default page. Either way it is only going to focus its “pings” against one specific page.
One important thing to keep in mind here is that the URL to ping needs to be a url that goes directly to one web server that debugdiag is installed on. Don’t specify, for example, a URL here that goes to a load balancer that then gets redirected to any of five different web servers / WFEs. If you want to use this option in a SharePoint farm, and you only have one URL that works, you might consider the possibility (use at your own risk) of creating an Alternate Access Mapping. And if you do create an AAM to go to one server, you will also probably have to add the appropriate binding in the IIS manager for the web site. (Adding an AAM wont add a binding. Extending a web site in Central Admin will add the binding but adding an AAM won’t.)
You can definitely tweak the frequency of the ping and the timeout value too.
Select your application pool that you want to dump out when the url proves unresponsive.
Also you have some additional settings that you can use if you like.
I’d probably use the “generate a userdump every 45 seconds” option if I were troubleshooting a high-cpu issue. And I’d probably tell it to stop generating after three (or possibly five) userdumps.
I’d only select full userdumps, not mini userdumps. But that’s just me. Others may differ. Minidumps rarely help in my experience.