In my previous blog post, I had shared how you can leverage SQL Server Performance Dashboard SSRS Reports for monitoring and point in time troubleshooting for large deployments of SQL Server. In this post, we will talk about why dbas should establish SQL Server Performance baselines for their environment and how you can leverage the SQL Server Performance Baselining reporting solution shared in our Tiger Toolbox Github repository to achieve that. The solution is already being leveraged at some of our large enterprise customers and further contributed by SQL community members like SQLAdrian in GitHub.
Why SQL Server Performance Baselining is important?
- Performance is relative – DBAs, Consultants and Support team often get called in a situation when the business application running on SQL Server is running slow. Before troubleshooting, one of the first question to ask is how slow is slow and how fast is fast?. In majority of the situations, there are changes in the workload, application code which lead to changes in performance but one can prove that something is changed only if the previous state of the SQL Server was captured. In such situations, performance baselining can assist you in learning from the historical data and trends to detect the anomalies or changes on the pattern on the server.
- Capacity Planning – As DBAs managing large deployment and mission critical instances of SQL Server, it is important to proactively keep an eye on resource utilization (CPU, Memory, IO and storage) and workload trend over a period, to forecast and plan for more capacity if the workload trend or resource utilization is changing. To plan for capacity, performance baselining reports is the key to perform historical analysis of data and predict capacity required for future.
- Consolidation Planning – As shadow IT and business applications running SQL Server grows in the enterprise, companies can save cost by consolidating some of their database under single instance by efficiently utilizing their hardware and resources. To plan and understand the resource utilization of SQL Server database, again performance baselining is required.
- Virtualization\Migration to Azure – Most enterprises today are looking toward to reduce their IT capital and operational expenses and overheads. When migrating to cloud, it is important to identify their VM size or performance tier to run your SQL Server databases which is easy when you have performance baselines established.
In general, Performance Baselining is capturing last known good state metrics which can be used as a starting point for comparison. Performance monitoring is the first step for baselining. In a production environment, the goal of performance monitoring should be to be non-invasive with minimal overhead. Baselining always requires all the metrics to be captured across time dimension to perform historical time analysis. Historical analysis of performance data is useful for anomaly detection or to forecast future performance based on current workload.
In SQL Server, at a very high level, we have 3 types of performance monitoring data available for us to capture the state of SQL Server
- DMVs (dynamic management views)
In SQL Server, one needs to capture at least following details at minimum over time to successfully establish comprehensive performance baseline for SQL Server instance.
- Processor(*)% Processor Time
- Process(sqlservr)% Processor Time
- Processor(*)% Privileged Time
- Process(sqlservr)% Privileged Time
- Available Mbytes
- Memory Pages/sec
- Process(sqlservr)Private Bytes
- Process(sqlservr)Working Set
- SQLServer: Memory Manager Total Server Memory
- SQLServer: Memory Manager Target Server Memory
- PhysicalDisk(*)Avg. Disk sec/Read
- PhysicalDisk(*)Avg. Disk sec/Write
- PhysicalDisk Avg. Disk Queue Length
- Disk Bytes/sec
- Avg Disk Bytes/Transfer
- Process(sqlservr)IO Data Operations/sec
- Network InterfaceBytes Received/sec
- Network InterfaceBytes Sent/sec
- Network Interface(*)Output Queue Length
SQL Server Workload
- SQL Server: SQL Statistics Batch Requests/sec
- SQL Server: SQL Statistics SQL Compilations/sec
- SQL Server: SQL Statistics SQL Recompilations/sec
- Max Workers
- Workers Created
- Idle Workers
- SQL Server: Wait Statistics
- Latch Waits > 15 sec
- Locks > 30 sec
- IO Latch Timeouts
- Long IO Status
- Deadlocked Scheduler
- Non-Yielding Scheduler
- Access Violations
- Corrupt Pages Status
- Connectivity errors
- Security errors
When monitoring large deployments of SQL Server instances, if you capture all the above data on a central location or SQL Server database, the central database becomes a bottleneck or the point of contention. To scale the performance baselining solution to large deployment of SQL Server, we use the same architecture as discussed in SQL Server Performance Dashboard SSRS Reports as shown below.
If you download SQL Performance Baselining Reporting Solution from GitHub, you will find scripts to create a database and monitoring jobs which needs to be ran on each instance of SQL Server you wish to monitor. Following are steps to deploy SQL Performance Baselining Reporting solution,
Data Collection Steps for each SQL Instance to Monitor
- Connect to SQL instance to monitor
- Run CREATEDATABASE.sql
- Run CREATEOBJECTS.sql
- Run CREATECOLLECTIONJOB.sql
- Check SQL Agent JOBs History to see if it runs successfully
- Repeat for each SQL Instance you want to monitor
Setting up & Deploying Reporting
To deploy Performance baselining SSRS reports, you can leverage the same central monitoring SSRS server hosting SQL Server Performance dashboard report. As mentioned in SQL Server Performance dashboard blog post, you can use the same steps to setup and configure SSRS server by deploying reports using SSDT or Report Manager. Alternatively, you can use PowerShell script
created by Aaron with some modifications mentioned in the previous blog post.
Once the reports are deployed, you can use the following reports to establish your performance baseline
The above report is drillthrough report so when you click on the chart for any specific data, it takes you to the daily performance trend report for that day as shown below
In the previous report, you observed that around 8PM, the sessions in suspended state increases which indicated high blocking. Around same time, most sessions are waiting on LCK_M_SCH_M and disk write latency is high around the same. Ohh, our reindexing job runs at around 8PM and we don't use WITH ONLINE=ON while reindexing which explains the blocking.
The Memory Utilization report shows you the trend of SQL Server Memory and Available Memory on the server. From the report, you can easily deduce, Available Memory on the server went down at around 2:30PM which caused external memory pressure causing target server memory to fall below total server memory. SQL Server responded to the memory by increasing the lazy writer activity to free up pages which in turn drops the page life expectancy on the server.
Finally, the database storage report gives you a quick glance of the free space in the databases hosted on the SQL Server.
Hope you find the reports useful and if you are a SSRS developer, I would encourage you to design some cool reports for performance monitoring using perfmon, DMV or Xevents data and add it to our Github repository.