I was speaking with some folks yesterday about tuning a system, and they asked me about my methodology. As time goes on, I find myself doing things differently based on the situation, but there is a general approach I use that I’ll explain here. This isn’t a hard-and-fast rule; it’s just a guideline I use to narrow in on the problem.
I start with three general areas:
1. Component Pressure
2. Waits and Queues
3. Top Processes
What I do first is to set up Performance Monitor (AKA System Monitor) and evaluate the general counters around CPU, Memory, I/O and Network. I watch which ones are showing the most activity (pressure) and then fine-tune the collection to know more about what part of that component is showing the issue.
Next I examine the Wait States and the Queues (using various DMV’s and so on) that tell me what the system is waiting to do.
Finally, I watch the top five or ten processes or queries that are running on the system, again using Perfmon and DMVs.
When I’m done with the collection, which might take a few minutes to a few days, I create a list of each of those items. There will be some that are in common, such as the I/O is causing the system to wait, which is caused by a long-running INSERT operation, and then I deal with those. Then I go after the things that were in one list (like the CPU that was running a little high for too long, for instance) but weren’t in another (no single query waiting on this). At some point you’re just re-arranging the deck chairs, so you tune until the system is behaving and then examine the architecture to see if there are any longer-term changes that need to be made.
Of course, none of this is optimal – it’s far better to extablish a baseline of information on your system and then when you experience an issue you can just compare to that. But most shops I visit don’t have that, so this is the next best thing. Comments welcome.