How to use SQLDiag, SQLNexus and PAL tools to analyze performance issues in SQL Server

SQLDiag, SQLNexus and PAL (Performance Analysis of Logs) tools can collectively be used to analyze performance problems in SQL Server. SQLDiag is a data collection utility, while SQLNexus and PAL tools can be used to analyze data collected by the SQLDiag utility. Ever used the PSSDiag utility that was (and is still) used by Microsoft support to assist customers in troubleshooting SQL Server performance issues? SQLDiag is the same tool as PSSDiag and is available out of the box in SQL Server 2005 and all later versions. PSSDiag utility is still available as a separate download on the Microsoft website.  Both SQLDiag and PSSDiag tools are similar in all aspects, except for the user interface that comes with the PSSDiag utility.

The SQLNexus tool can be used to analyze SQL server performance data such as blocking, wait stats and SQL traces. The PAL tool can be used to analyze the Perfmon counters collected in a .BLG file . Both of these tools are open source tools and are publicly available for download on the www.codeplex.com website. Please see the SQL Server Downloads section on the Blog Home for the download location of these tools and their pre-requisites components.

What sort of diagnostic data can I collect by using the SQLDiag tool?

The SQLDiag utility is capable of collecting the following sets of diagnostic information -

  • Windows Event Logs (Application, System, and Security) in .CSV files
  • System Monitor (aka Perfmon) performance counters in .BLG file/s
  • SQL Server Profiler traces in .TRC file/s
  • SQL Server Blocking chains in a _BLK.TRC file
  • SQL Server error logs, configuration, point-in-time snapshots of several DMVs in a .OUT file
  • System information via the Msinfo32 utility in a .TXT file

Additionally, SQLDiag can be configured to collect custom data such as the output of a .CMD batch file or a .EXE utility.

What sort of performance issues can I troubleshoot with the SQLDiag, SQLNexus and the PAL tools?

SQLDiag, SQLNexus and PAL tools can be used to troubleshoot a variety of performance issues in SQL Server. I have used these tools on many occasions to assist my customers while troubleshooting performance problems, mostly in the following scenarios -

  • Intermittent CPU spikes on the SQL Server box
  • Performance degradation during specific hours (say between 2:00 PM and 4:00 PM everyday)
  • Troubleshooting Blocking chains in SQL Server
  • Analyzing overall performance bottlenecks in SQL Server using wait statistics
  • Identifying top queries by CPU, Duration, Reads and Writes during a specific timeframe (say between 9:00 AM and 5:00 PM)
  • Identifying specific databases and logins (sessions) that are consuming the maximum CPU and IO on a SQL sever instance during a specific timeframe (say between 9:00 AM and 5:00 PM)
  • Troubleshooting performance issues with a batch job that runs every night during a specific timeframe (say between 2:00 AM and 4:00 AM)
  • Comparing SQL server performance for various tests in a lab testing scenario – say you are seeing performance and scalability issues with a 150 user test, while the 50 user test runs fine.

Of course, this is not an exhaustive list  that describes the application of these tools and you will find them useful in troubleshooting other issues in SQL Server as well. In my future posts, I will attempt to describe how I used these tools to troubleshoot some of the above scenarios.

How are these tools different from the inbuilt features such as Standard Reports and Activity Monitor in Management Studio ?

That’s a great question! Why even bother to learn these additional tools when we have introduced so many great features in SQL Server 2005 such as Standard Reports, and Activity Monitor in Management Studio reports and Dynamic Management Views (DMVs). The answer is – it depends! Each of these tools have their own strengths and can be useful in a specific scenario depending on the problem you are trying to troubleshoot. SQLDiag, SQLNexus and PAL are very helpful when troubleshooting the point-in-time performance issues – such as issues occurring during a specific timeframe,  say between 2:00 PM and 4:00 PM. Standard Reports and Activity Monitor in Management Studio are based on DMVs (Dynamic Management Views), which stores cumulative statistics since the last start of SQL Server. So if your SQL Server instance has been running since last two months and say you have been experiencing performance issues between 2:00 PM and 4:00 PM since only a last few days, then you may not get a lot of useful information from the Standard reports or Activity Monitor. Standard reports will show you the cumulative information since the last restart of SQL Server instance and not for the specific timeframe that your are interested in - such as between 2:00 PM and 4:00 PM. Similarly, Activity Monitor will display the live performance statistics from a SQL server instance and not for a specific time frame in the past. But if you want to be able to collect the diagnostic data for a specific time frame so that you can analyze it later, or so that you can share it with someone else for later troubleshooting, then SQLDiag/SQLNexus/PAL are your friends.

How are these tools different from the new Management Data Warehouse (MDW) feature introduced in SQL Server 2008?

The Management Data Warehouse (MDW) is a new feature that was introduced in SQL Server 2008. MDW implements the best of SQLDiag/SQLNexus and “generally” replaces the need for each of them. It’s a framework that allows for collecting performance data from one or more SQL Server instances on an ongoing basis and storing that data permanently in a SQL Server database. MDW provides a set of inbuilt reports that enables you to go back in time, select a particular time frame and analyze performance data only for the selected timeframe. It provides similar reports as the SQLNexus tool. Customers must have enabled this feature on their SQL Server instances to be able to use it. Many customers are not fully aware of this great feature and many have not implemented it in their environment yet, so I still rely on the SQLDiag, SQLNexus and the PAL tools to assist my customers when troubleshooting performance issue. In addition, when there is a need to do in-depth analysis, I like several specific aspects of the SQLNexus reports that are not available in the MDW reports . For example, the SQLNexus reports can tell me which specific databases and logins are responsible for most CPU usage on my server. I like the Interesting Events reports available in SQLNexus while we don’t have a similar report in MDW. Another great feature I like in the SQLNexus reports is the distribution of query statistics such as # of executions, Duration, I/O, and CPU across smaller intervals. Consider a scenario where you are troubleshooting CPU spikes that occur everyday on your server between 2:00 PM and 4:00 PM. Say you collected the SQLDiag data between 2:00 PM and 4:00 PM and analyzed it with SQLNexus. Assuming that a particular SQL query was responsible for these CPU spikes, you can drill down into the SQLNexus reports and find out which particular query execution relates to the CPU spikes on the server and if it is the number of executions of that query that is responsible for the CPU spikes.

Don’t get me wrong, I am not trying to devalue the significance of the MDW reports here when comparing them with the SQLNexus reports. There is so much information in the MDW reports which is absent in the SQLNexus reports and for all practical purposes, MDW is a great feature to analyze generic performance issues in SQL Server. I like to consider the SQLNexus reports as a complement to the MDW reports if I need to dive deeper into the performance issues. I would suggest that you explore the reports presented by SQLNexus and MDW and see for yourself which one of them suits your taste! I love both of them :)

So now that we have talked about the significance of these tools, lets talk about how to use them.

How to use SQLDiag?

On a SQL Server 2005 server, the SQLDiag utility is present under the following folder as SQLDiag.exe file – [systemroot]\Program Files\Microsoft SQL Server\90\Tools\Binn. On a SQL Server 2008 box, SQLDiag is present under [systemroot]\Program Files\Microsoft SQL Server\100\Tools\Binn. SQLDiag uses an XML configuration file which can be used to specify the data that you want to collect. By default, SQLDiag uses the SQLDiag.xml configuration file that gets automatically created in the \Binn directory when you launch SQLDiag for the very first time. This file is configured to collect only the following information -

  • SQL Server error logs and configuration information in a file named [servername]_SQL200[x]_sp_sqldiag_Shutdown.out
  • Default trace files named Log_[x].trc (Note that these are not regular profiler traces that you want to collect for performance troubleshooting)
  • SQLDumper logs (if any)
  • System information in a file named [servername]_MSINFO32.txt

You can create you own XML configuration file and configure it to specify the data that you want to collect. You can make a copy of the SQLDiag.xml file and modify it to collect Windows event logs, Perfmon counters, Profiler traces, SQL Blocking information etc. For example, to collect the event logs, enable the EventlogCollector by modifying the tag <EventlogCollector enabled="false" startup="false" shutdown="true" > as <EventlogCollector enabled="true" startup="false" shutdown="true" > . Similarly, you can enable the <PerfmonCollector> to collect specific performance counters and the <ProfilerCollector> to collect profiler traces. Please take a note of few important things when creating your own configuration file -

  • Do not modify the default SQLDiag.xml file, because this file gets overwritten every time you run the SQLDiag utility. Instead, copy this file and give it a different name, something like MyCustom_SQLDiag_config.xml
  • If you have multiple instances of SQL server running on the same box,  SQLDiag (by default) will collect data from all running SQL Server instances of the same version. So if you launch SQLDiag from the \90\Tools\Binn directory, it will capture data from all running SQL Server 2005 instances on that server. However, you can modify your custom configuration file to specify the name of a specific SQL Server instance for data collection. All you need to do is to replace the tag <Instance name="*" with <Instance name="[InstanceName]" in your custom configuration file. If you want to collect data from only the default instance of SQL Server, modify this tag as <Instance name="MSSQLSERVER".
  • If you are running the utility on the active node of a cluster, replace the <Machine name="."> tag with <Machine name="[SQLServerVirtualServerName]">
  • The <EventlogCollector> tag allows data collection at two intervals – at startup when you start the SQLDiag data collection or at shutdown when you stop the data collection. You may chose to enable either or both of them, but for most scenarios you may only want to enable the data collection at shutdown (which is the default)
  • If you do not wish to collect default traces or SQLDumper logs or system information using Msinfo32 utility, you can disable the appropriate data collectors under the <CustomDiagnostics> tag. Additionally, you can add your own custom data collector under this section.
  • The XML configuration file is case-sensitive, so be careful while enabling or disabling specific data collectors. Make sure that when you are modifying the “true” or “false” values, they are in lower case.
  • And, don’t forget to save your configuration file after making changes to it!

Once your custom configuration file is ready, you can launch the SQLDiag tool from the command prompt and use the /I parameter to specify the name of your configuration file, something like this -

Capture

By default, SQLDiag creates a folder with the name SQLDIAG under the \Tools\binn directory and dumps all the output data files into this folder. However, you can change this default location by using the /O parameter

How to collect SQLDiag data needed to generate the SQLNexus reports

As you may have noticed, some initial efforts are required, such as configuring your own custom configuration file, to capture meaningful performance data with the SQLDiag utility.  If your goal of capturing the SQLDiag data is only to generate the SQLNexus reports, there is an easier way to do it. Follow these steps to collect the SQLDiag data, which is required to generate the SQLNexus reports -

  • Click here to download the appropriate scripts files to run SQLDiag. There are two .zip files available for download on this page, PerfStatsSCript2005.zip for SQL Server 2005 and PerfStatsScript2008.zip for SQL Server 2008.
  • Extract the contents of the ZIP file on a drive with plenty of disk space available on the concerned SQL Server box. I would suggest that the drive where you plan to collect the diagnostic data should have at least a couple of GBs of disk space available, since the SQL trace files may grow pretty huge depending on the load on your server. After you have started the data collection, you should monitor the size of the trace files to see how fast they are growing and if you have sufficient disk space available on the drive or not.
  • Examine the extracted files. You will notice that there are three XML configuration files and the corresponding CMD batch files. For initial troubleshooting purposes, you should use the SQLDiagPerfStats_Trace200[x].xml file and the corresponding StartSQLDiagTrace2008.cmd file. As the names suggest, the other two configuration files are meant for trace replay purposes (if you want to simulate production workload in your test system) and if you are interested in capturing detailed traces (statement level events). Notice that capturing detailed traces can introduce significant overheads on the server.
  • The XML configuration files are configured to collect the data from all running instances of SQL Server on the box. If you have multiple SQL Server instances running on the same computer and want to collect data from a specific instance, open the file SQLDiagPerfStats_Trace200[x].xml in Notepad and replace “*" in the tag <Instance name="*" with the actual name of the named instance (portion after the backslash) . If it’s a default instance of SQL server, replace “*” with the word “MSSQLSERVER”. Click on File -> Save to save the file.
  • When you are ready to start the SQLDiag data collection, open command prompt (use Run as Administrator options if on Windows Server 2008) and navigate to the folder where you had extracted the perf stats script files. Run the file named StartSQLDiagTrace200[x].cmd. This will start the SQLDiag data collection processes and you should see a prompt saying ‘Data collection has started, press CTRL + C to stop’
  • The SQLDiag tool will create a folder with the name SQLDiagOutput inside the same folder where you are running this script from. All output data files are captured inside this folder
  • Allow the tool to run for the duration of the performance problems, while SQLDiag captures the diagnostic data. When you are ready to stop the SQLDiag data collection, press CTRL + C.  Do not close the command prompt window while SQLDiag is in the process of shutting down. Doing so can leave orphaned processes on your server. Close the command window only after it returns back the the prompt.
  • Examine the files inside the SQLDiagOutput folder. You should see one (or more) Perfmon counter files with extension .BLG, one (or more ) trace files with extension .TRC, three .CSV files one for each Windows event logs and a bunch of files with extension .OUT that contains SQL server error logs, configuration, blocking and wait stats data.

Now that you have captured the SQLDiag data, you are ready to analyze it using the SQLNexus tool.

How to use the SQLNexus tool at analyze SQLDiag data?

The SQLNexus tool requires a SQL Server 2005 or 2008 databases for its analysis. Install a local SQL Server 2005/2008 instance on the machine where you plan to do the SQLNexus analysis, if you don’t have one already. A remote SQL server instance will work okay as well, but I would recommend using a local instance if possible. I would not recommend using the SQLNexus tool on a production server or using the production SQL Server instance for analysis, even if it is possible in your environment. Also note that prior to using SQLNexus, you would need to install the pre-requisite components as discussed in the steps below.

Follow these steps to use the SQLNexus tool for analyzing SQLDiag data -

  • Download and install the Report Viewer 2008 Redistributable. This is a pre-requisite for SQLNexus
  • Download and install the 32 bit version of RML utilities and here to install 64 bit version of RML utilities. This is a pre-requisite for SQLNexus
  • Download the SQLNexus.zip file from here
  • Extract the contents of the zipped file in a folder. SQLNexus doesn’t require any installation as such. All it needs to run are the files inside the extracted folder. Since SQLNexus does not get installed like a regular Windows application, you won’t see a shortcut for it under Start –> Programs or on the Desktop. I would suggest creating a shortcut for the SQLNexus.exe in the shortcut pane or on the desktop
  • Launch the SQLNexus.exe from the location where you extracted the contents. You will see the following screen that prompts for the SQL server name and the instance name. Type in the instance name that you plan to use for the SQLNexus analysis.

image

  • Once you click on Connect on the above screen, SQLNexus will create a database by the name SQLNexus on the instance that you have specified. If you want to use SQLNexus again to analyze a different set of SQLDiag data, either you would need to delete this database manually, or create a new database from within SQLNexus, which can be done by using the Current DB box on the toolbar and selecting the option <New Database>.
  • Click on the Import option under Data menu and browse to the SQLDiagOutput folder that contains the SQLDiag data. Click on the Import button.

Capture

  • After SQLNexus has completed importing data into the SQL database, you can analyze various reports by using various links under the Reports menu on the top left hand side. These reports are drill down reports and can be used to dive deeper into a particular area. For example to view the top queries by CPU, Duration, Reads and Writes, click on the ReadTrace_Main Report on the left hand side and then click on Unique_Batch_TopN report link. You can also export these reports to Excel, PDF or HTML format using the Export link under Tasks menu on the left hand side.

Note: On several occasions, I had noticed that SQLNexus would fail to analyze the trace files. I have been able to resolve that issue by moving the _Blk.trc file to separate location or deleting it altogether. This trace file captures the blocking chains from the data collection period. SQLNexus uses a different set of scripts based on DMVs for capturing blocking and wait stats information and so this file is not required by the SQLNexus tool.

How to use the PAL tool to analyze performance counters data (.BLG files) collected by SQLDiag?

The PAL (Performance Analysis of Logs) tool is a powerful tool that reads in a performance monitor counter log and analyzes it using known thresholds. Follow these steps to use the PAL tool to analyze performance counters collected by the SQLDiag tool in one or more .BLG files -

  • If you are using older Windows versions such as Windows XP/Vista or Windows Server 2003, download and install the following pre-requisites - Microsoft .NET Framework 3.5 Service Pack 1 and PowerShell v2.0 or greater
  • Download and install the Microsoft Chart Controls for Microsoft .NET Framework 3.5 from here, which is another pre-requisite for the PAL tool
  • Download and install the PAL tool from here
  • Launch PAL by clicking on Start –> Programs –> PAL. Click Next on the Welcome screen
  • On the Counter Log page, browse and select the SQLDiag.blg file in the SQLDiagOutput folder, and click Next

image

  • On the Threshold file page, select the option “Microsoft SQL Server 2005/2008” from the drop down box and click on Next

image

  • Answer all questions on the Questions page and click on Next
  • Select appropriate options on the Output Options, File Output, Queue and Execute pages. For most cases, you can leave these options at their default settings.
  • After you click on Finish, a command prompt window will open automatically and close once the PAL the analysis is completed.

PAL analysis may take some time depending on the amount of Perfmon data that you have collected, so please be patient. Once the analysis is complete, PAL reports will open up automatically in a browser window.  By default, these reports are stored under My Documents\PAL Reports.

Here are some additional reference links if you are interested in learning more about the PAL tool -

I hope you would enjoy working with these tools and the wonderful reports that they present! Please feel free to leave me any comments or feedback you may have and I will get back to you as soon as possible

Cheers!