Diagnose your SQL Server

I assume that anyone reading this will be very familiar with collecting traces with SQL Server Profiler. And be equally familiar with the two main limitations of SQL Server Profiler:

  1. It adds a big overhead to SQL Server, which is the last thing you need when troubleshooting performance problems, and
  2. How to make sense of the vast amount of data that you can end up with.

This blog shows how to make each of these limitations much less of a problem. It is in two sections: Data collection, and Data analysis. The data collection part is very different depending on whether you run on SQL 2000 or SQL 2005. This blog focuses on SQL 2005, but there is a small section in the end about how to achieve the same on SQL 2000.

Data collection
There used to be (and still is) a tool called PSSDiag. PSS being "Product Services and Support", the tool was designed for support engineers to collect trace files, logs, etc from customers who may not know much about SQL Server. So the support engineer decides what data to collect, and then sends a data collection tool to the customer which is easy to start.

In SQL 2005, this tool is now part of SQL Server, and is called SQLDiag. This is how to use it:

From a command prompt on the SQL Server machine, go into this folder:
C:\Program Files\Microsoft SQL Server\90\Tools\Binn\
then type:
sqldiag

The tool will start up, which will take a little while. If it shows any warnings, then ignore those - as long as it starts up. Once it is ready, it will display the following in green text:
SQLDIAG Collection started.  Press Ctrl+C to stop.

SQLDiag will now run in the background, collecting SQL Profiler traces and other information. As default, it will collect it to a subfolder called SQLDIAG. If this folder doesn't exist, the tool will create it. Once the customer has seen enough of the problem(s) that you are troubleshooting, they just have to stop the tool (Ctrl+C), then zip the content of the SQLDIAG subfolder, and send it to the support engineer for analysis.

There is one optional parameter that should be used for the data collection. SQLDiag uses the file SQLDiag.XML to configure which SQL Profiler events and other events to apture. As default, this file does not contain much, so you should use another configuration file. Note: You should not modify SQLDiag.XML directly, but make a copy of the file, and then modify the copy. At the moment there are no simple tools to help you with this, so unfortunately it is not too simple to change the configuration. But attached is a configuration file (PSSDiagCustom.XML) which configures SQLDiag to collect trace files that are more similar to a standard SQL Profiler trace.

To tell SQLDiag to use this file, use the /I parameter. For example, copy the attached file into a folder called C:\SQLDiagCOnfig\. Then run SQLDiag like this:
SQLDiag /I C:\SQLDiagCOnfig\PSSDiagCustom.XML

Now you should be able to use SQLDiag to collect information from the customer's system. SQLDiag creates a number of files in the output folder, but here we will only look at the .trc file that it creates. This is a normal SQL Profiler trace file which you can open and analyse with SQL Server Profiler.

You can of course make .bat files to specify the commands instead of the customer having to go to the command prompt. You can also set SQLDiag up to run as a service, and then the customer just has to start this service. Or you can schedule the service to start and stop at certain times like any other service.

Other things to be aware of are:
  - As mentioned, the default output folder for SQLDiag is C:\Program Files\Microsoft SQL Server\90\Tools\Binn\SQLDIAG\, but you can change it with the /O parameter. But always use a folder on the SQL Server itself. If you set it to use a network location, you will add overhead to the tool.

  - SQLDiag can collect a lot of information. Expect 100s of gigabytes. So you must make sure to have a lot of free disk space available. Also, the customer should run it only long enough to reproduce the problems they have. I would recommend to run it for not more than an hour or so.

  - SQLDiag creates much less of an overhead on SQL Server than running a normal SQL Profiler trace because it does not have a user interface. Much of the overhead caused by SQL Profiler goes into displaying the collected information.

Run "SQLDiag /?" from the command prompt to see the additional parameters that are available, or look up "SQLDiag Utility" in SQL Server online help for further information about the tool.

 

Data analysis
Once you have collected SQL Profiler trace(s) (whether you use SQLDiag or start a SQL Profiler trace manually), the next challenge is to analyse this. You can of course just open the trace file(s) in SQLProfiler, and then take a look. But this is not an easy way to spot the worst queries.

Luckily, there is an easy syntax to load trace files into a SQL table, which will allow you to query the events ordered by Duration, Reads, Writes or anything else. This is the syntax:

--Load trace files into a SQL table so it can be queried:
SELECT * INTO temp_trc
FROM ::fn_trace_gettable('c:\MyTrace.trc', default)

SQLDiag will make trace files up to 350GB in size, and then create a new trace file. So you will often end up with files like this in the Output folder:
Server__sp_trace.trc - 350GB
Server__sp_trace.trc_1 - 350GB
Server__sp_trace.trc_2 - 150GB

The "default" parameter in the syntax above means that it will automatically continue with the next file. So in this example, if you run this from SQL Management studio:

SELECT * INTO temp_trc
FROM ::fn_trace_gettable('c:\Server__sp_trace.trc', default)

then it will automatically read all three files into the temp_trc table.

If you want to limit how much to read into the temp_trc table, then instead of specifying "default", specify a number which will tell it the maximum number of files to read.

This is useful because the command can take a long time to run, and take a lot of database space. So if for example you have collected 20 trace files, then it can be necessary to read 5-10 of them at a time.

The syntax will automatically create a new table in the current database. Once you have this table, then you can add indexes. Useful indexes would be Reads, Duration, Writes etc.

And then you have an easy way to identify the "worst" queries, ordered by any of these, for example:

SELECT Reads, * FROM temp_trc ORDER BY 1 DESC --or:
SELECT Duration, * FROM temp_trc ORDER BY 1 DESC

"ORDER BY 1" just means: Order by the first column that you specify - in the example above, Reads or Duration. This is to avoid "Ambiguous column name 'Reads'."-errors if you have both a column and an index called the same.

So in this way you can easily sort your events by anything you need. The top events may look like this:
exec sp_execute 380,1,'20000'

I would advice that you ignore these events, since they don't really tell you anything. Focus on events that you can relate to the activities that you are troubleshooting.

 

Data collection on SQL2000

As mentioned, SQLDiag is new in SQL2005. For SQL2000 you need a tool called PSSDiag. To download it, go to support.microsoft.com and just search for "PSSDIAG data collection utility". After installing this, run the file DiagConfig.exe, which lets you decide what events to collect. This part is actually a lot simpler than with SQLDiag because you have a graphical interface here.

Once you have decided what to collect, click the "Save" button. This will create a file called \Customer\pssd.exe which you can send to the customer. You also have simple directions in the file \docs\PSSDIAG Instructions.doc to send to the customer as well. From here, the steps to start collecting are the same as with SQLDiag.

 

These postings are provided "AS IS" with no warranties and confer no rights. You assume all risk for your use.

PSSDiagCustom.XML