In my previous blog post, I shared how you can leverage SQL Server Performance Dashboard SSRS Reports for monitoring and point in time troubleshooting for large deployments of SQL Server. In this post, we will talk about why dbas should establish SQL Server Performance baselines for their environment and how you can leverage the SQL Server Performance Baselining reporting solution shared in our Tiger Toolbox Github repository to achieve that. The solution is already being deployed and leveraged by some of our customers and further contributed by SQL community members like SQLAdrian in GitHub.
Why SQL Server Performance Baselining is important?
- Performance Tuning – DBAs, Consultants and Support team often get called in a situation when the business application running on SQL Server is running slow. Before troubleshooting, one of the first question to ask is how is the performance measured and what was the performance previously ? In majority of the situations, there are changes in the workload, application codes which leads to changes in performance but one can prove it only when the previous state of the SQL Server was captured. In such situations, performance baselining can assist you in learning from the historical data and trends to detect the anomalies or changes on the pattern on the server which may have caused performance issues observed.
- Capacity Planning – For DBAs managing large scale deployments and mission critical instances of SQL Server, it is important to proactively keep an eye on resource utilization (CPU, Memory, IO and storage) and workload trend over a period, to forecast and plan for more capacity as the workload trend and resource utilization is changing. For capacity planning, performance baselining reports are the key to forecast and predict capacity required for future by analyzing the historical trends.
- Consolidation Planning – As shadow IT and number of business applications running SQL Server grows in enterprises, companies can save cost by consolidating some of their databases under single instance for efficiently utilizing their hardware and resources. To plan resources and consolidate SQL Server databases, again performance baselining is required.
- Virtualization\Migration to Azure – Most enterprises today are looking to reduce their IT capital and operational expenses and overheads by migrating to cloud. When migrating to cloud, it is important to identify right VM size or performance tier to run your SQL Server databases which is easy when you have performance baselines captured and established.
In general, Performance Baselining is capturing last known good state metrics which can be used as a starting point for comparison. Performance monitoring is the first step for baselining. In a production environment, the goal of performance monitoring should be to be non-invasive with minimal overhead. Baselining always requires all the metrics to be captured across time dimension to perform historical time analysis. Historical analysis of performance data is useful for anomaly detection or to forecast future performance based on current workload.
In SQL Server, at a very high level, we have 3 types of performance monitoring data available to capture the state of SQL Server
- DMVs (dynamic management views)
In SQL Server, one needs to capture at least following details at minimum over time to successfully establish comprehensive performance baseline for SQL Server instance.
- Processor(*)% Processor Time
- Process(sqlservr)% Processor Time
- Processor(*)% Privileged Time
- Process(sqlservr)% Privileged Time
- Available Mbytes
- Memory Pages/sec
- Process(sqlservr)Private Bytes
- Process(sqlservr)Working Set
- SQLServer: Memory Manager Total Server Memory
- SQLServer: Memory Manager Target Server Memory
- PhysicalDisk(*)Avg. Disk sec/Read
- PhysicalDisk(*)Avg. Disk sec/Write
- PhysicalDisk Avg. Disk Queue Length
- Disk Bytes/sec
- Avg Disk Bytes/Transfer
- Process(sqlservr)IO Data Operations/sec
- Network InterfaceBytes Received/sec
- Network InterfaceBytes Sent/sec
- Network Interface(*)Output Queue Length
SQL Server Workload
- SQL Server: SQL Statistics Batch Requests/sec
- SQL Server: SQL Statistics SQL Compilations/sec
- SQL Server: SQL Statistics SQL Recompilations/sec
- Max Workers
- Workers Created
- Idle Workers
- SQL Server: Wait Statistics
- Latch Waits > 15 sec
- Locks > 30 sec
- IO Latch Timeouts
- Long IO Status
When monitoring large deployments of SQL Server instances, if you capture all the above data on a central location or central SQL Server database, the central database becomes a bottleneck or the point of contention. To scale the performance baselining solution for large deployments of SQL Server, we use the same architecture as discussed in SQL Server Performance Dashboard SSRS Reports as shown below. In this architecture, the performance data is captured locally in a database called dba_local with performance monitoring SSRS reports hosted on a central monitoring server. The SSRS reports are parameterized and have dynamic connection string which connects to SQL Server instance specified in the report parameter. The report queries the dba_local database from the specified SQL Server instance to report performance data captured over time.
If you download SQL Performance Baselining Reporting Solution from GitHub, you will find scripts to create a database and monitoring jobs which needs to be ran on each instance of SQL Server you wish to monitor. Following are steps to deploy SQL Performance Baselining Reporting solution,
Data Collection Steps for each SQL Instance to Monitor
- Connect to SQL instance to monitor
- Run CREATEDATABASE.sql
- Run CREATEOBJECTS.sql
- Run CREATECOLLECTIONJOB.sql
- Check SQL Agent JOBs History to see if it runs successfully
- Repeat for each SQL Instance you want to monitor
Setting up & Deploying Reporting
To deploy Performance baselining SSRS reports, you can leverage the same central monitoring SSRS server hosting SQL Server Performance dashboard report. As mentioned in SQL Server Performance dashboard blog post, you can use the same steps to setup and configure SSRS server by deploying reports using SSDT or Report Manager. Alternatively, you can use PowerShell script created by Aaron (b|t) with some modifications mentioned in the previous blog post as well.
Once the reports are deployed, you can use the following reports to establish your performance baseline
The above report is drillthrough report so when you click on the chart for any specific data, it takes you to the daily performance trend report for that day as shown below
In the previous report, you observed that around 8PM, the sessions in suspended state increases which indicates high blocking causing performance slowness or degradation. Around same time, most sessions are waiting on LCK_M_SCH_M and disk write latency is high around the same. And later we found
Ohh, our reindexing job runs at around 8PM and we don’t use WITH ONLINE=ON while reindexing which explains the blocking.
The Memory Utilization report shows you the trend of SQL Server Memory and Available Memory on the server. From the report, you can easily deduce, Available Memory on the server went down at around 2:30PM which caused external memory pressure causing target server memory to fall below total server memory in SQL Server. SQL Server responded to the memory by increasing the lazy writer activity to free up pages which in turn drops the page life expectancy on the server.
Finally, the database storage report gives you a quick glance of the free space on all the databases hosted on the SQL Server.
Hope you find the reports useful and if you are a SSRS developer, I would encourage you to design some cool reports for performance monitoring using perfmon, DMV or Xevents data and contribute back to the community by sending a pull request to our solution in GitHub repository.