SQL Server Performance Baselining Reports Unleashed for Enterprise Monitoring !!!


In my previous blog post, I had shared how you can leverage SQL Server Performance Dashboard SSRS Reports for monitoring and point in time troubleshooting for large deployments of SQL Server. In this post, we will talk about why dbas should establish SQL Server Performance baselines for their environment and how you can leverage the SQL Server Performance Baselining reporting solution shared in our Tiger Toolbox Github repository to achieve that. The solution is already being leveraged at some of our large enterprise customers and further contributed by SQL community members like SQLAdrian in GitHub.

Why SQL Server Performance Baselining is important?

  • Performance is relative – DBAs, Consultants and Support team often get called in a situation when the business application running on SQL Server is running slow. Before troubleshooting, one of the first question to ask is how slow is slow and how fast is fast?. In majority of the situations, there are changes in the workload, application code which lead to changes in performance but one can prove that something is changed only if the previous state of the SQL Server was captured. In such situations, performance baselining can assist you in learning from the historical data and trends to detect the anomalies or changes on the pattern on the server.
  • Capacity Planning – As DBAs managing large deployment and mission critical instances of SQL Server, it is important to proactively keep an eye on resource utilization (CPU, Memory, IO and storage) and workload trend over a period, to forecast and plan for more capacity if the workload trend or resource utilization is changing. To plan for capacity, performance baselining reports is the key to perform historical analysis of data and predict capacity required for future.
  • Consolidation Planning – As shadow IT and business applications running SQL Server grows in the enterprise, companies can save cost by consolidating some of their database under single instance by efficiently utilizing their hardware and resources. To plan and understand the resource utilization of SQL Server database, again performance baselining is required.
  • Virtualization\Migration to Azure – Most enterprises today are looking toward to reduce their IT capital and operational expenses and overheads. When migrating to cloud, it is important to identify their VM size or performance tier to run your SQL Server databases which is easy when you have performance baselines established.

In general, Performance Baselining is capturing last known good state metrics which can be used as a starting point for comparison. Performance monitoring is the first step for baselining. In a production environment, the goal of performance monitoring should be to be non-invasive with minimal overhead. Baselining always requires all the metrics to be captured across time dimension to perform historical time analysis. Historical analysis of performance data is useful for anomaly detection or to forecast future performance based on current workload.

In SQL Server, at a very high level, we have 3 types of performance monitoring data available for us to capture the state of SQL Server

  • Perfmon
  • DMVs (dynamic management views)
  • Xevents

In SQL Server, one needs to capture at least following details at minimum over time to successfully establish comprehensive performance baseline for SQL Server instance.

Processor Information

  • Processor(*)% Processor Time
  • Process(sqlservr)% Processor Time
  • Processor(*)% Privileged Time
  • Process(sqlservr)% Privileged Time

Memory

  • Available Mbytes
  • Memory Pages/sec
  • Process(sqlservr)Private Bytes
  • Process(sqlservr)Working Set
  • SQLServer: Memory Manager Total Server Memory
  • SQLServer: Memory Manager Target Server Memory

Physical Disk

  • PhysicalDisk(*)Avg. Disk sec/Read
  • PhysicalDisk(*)Avg. Disk sec/Write
  • PhysicalDisk Avg. Disk Queue Length
  • Disk Bytes/sec
  • Avg Disk Bytes/Transfer
  • Process(sqlservr)IO Data Operations/sec

Network

  • Network InterfaceBytes Received/sec
  • Network InterfaceBytes Sent/sec
  • Network Interface(*)Output Queue Length

SQL Server Workload

  • SQL Server: SQL Statistics Batch Requests/sec
  • SQL Server: SQL Statistics SQL Compilations/sec
  • SQL Server: SQL Statistics SQL Recompilations/sec
  • sys.dm_exec_requests
  • Sys.dm_exec_sessions
  • sys.dm_exec_connections
  • Max Workers
  • Workers Created
  • Idle Workers
  • Deadlocks

Waits

  • SQL Server: Wait Statistics
  • Sys.dm_os_waiting_tasks
  • Sys.dm_os_wait_stats
  • Latch Waits > 15 sec
  • Locks > 30 sec
  • IO Latch Timeouts
  • Long IO Status

Database Indexes

  • Sys.dm_db_index_physical_stats
  • Sys.dm_db_index_operational_stats
  • Sys.dm_db_index_usage_stats

Fatal Issues

  • Deadlocked Scheduler
  • Non-Yielding Scheduler
  • Access Violations
  • OOM
  • Corrupt Pages Status
  • Connectivity errors
  • Security errors

When monitoring large deployments of SQL Server instances, if you capture all the above data on a central location or SQL Server database, the central database becomes a bottleneck or the point of contention. To scale the performance baselining solution to large deployment of SQL Server, we use the same architecture as discussed in SQL Server Performance Dashboard SSRS Reports as shown below.

If you download SQL Performance Baselining Reporting Solution from GitHub, you will find scripts to create a database and monitoring jobs which needs to be ran on each instance of SQL Server you wish to monitor. Following are steps to deploy SQL Performance Baselining Reporting solution,

Data Collection Steps for each SQL Instance to Monitor

  1. Connect to SQL instance to monitor
  2. Run CREATEDATABASE.sql
  3. Run CREATEOBJECTS.sql
  4. Run CREATECOLLECTIONJOB.sql
  5. Check SQL Agent JOBs History to see if it runs successfully
  6. Repeat for each SQL Instance you want to monitor

Setting up & Deploying Reporting

To deploy Performance baselining SSRS reports, you can leverage the same central monitoring SSRS server hosting SQL Server Performance dashboard report. As mentioned in SQL Server Performance dashboard blog post, you can use the same steps to setup and configure SSRS server by deploying reports using SSDT or Report Manager. Alternatively, you can use PowerShell script
created by Aaron with some modifications mentioned in the previous blog post.

Once the reports are deployed, you can use the following reports to establish your performance baseline

The above report is drillthrough report so when you click on the chart for any specific data, it takes you to the daily performance trend report for that day as shown below

In the previous report, you observed that around 8PM, the sessions in suspended state increases which indicated high blocking. Around same time, most sessions are waiting on LCK_M_SCH_M and disk write latency is high around the same. Ohh, our reindexing job runs at around 8PM and we don't use WITH ONLINE=ON while reindexing which explains the blocking.

The Memory Utilization report shows you the trend of SQL Server Memory and Available Memory on the server. From the report, you can easily deduce, Available Memory on the server went down at around 2:30PM which caused external memory pressure causing target server memory to fall below total server memory. SQL Server responded to the memory by increasing the lazy writer activity to free up pages which in turn drops the page life expectancy on the server.

Finally, the database storage report gives you a quick glance of the free space in the databases hosted on the SQL Server.

Hope you find the reports useful and if you are a SSRS developer, I would encourage you to design some cool reports for performance monitoring using perfmon, DMV or Xevents data and add it to our Github repository.

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

Comments (67)

  1. Jay says:

    Thank you for sharing.

  2. Awesome solution, Thank you very much.

    I am working to integrate your solution with PowerBI and will post my solution.

    B/w if you can add below steps to enable cmdshell, it will resolve job failure issue for user like me who got this error.

    sp_configure ‘xp_cmdshell’, 1
    go
    reconfigure

  3. Ken says:

    Well this is deliciously fantastic, thanks for sharing!

  4. Rafael says:

    Thanks a lot for sharing.
    The first time I saw these reports was in youtube.

  5. Rogelio Dominguez says:

    AMAZING!!

  6. Awesome, thank you. SQL Server needs built-in solution like that.

    1. Marcus Klemp says:

      Ever try Data Collectors? Surprised they did not mention them.

  7. Pete says:

    Thank you. Its a good tool and framework.

    I noticed it only works out of the box for default instances. Had some issues getting the dashboard to work on named instances. Any guidance on how we can use this on named instances without much tweaking would be cool.

    1. Pete, it should work for named instances as well by specifying the [servername\instancename] or using servername, portno.

      1. londondba says:

        Yes, there is an issue for named instances and the spGetPerfCountersFromPowerShell stored procedure. See this line in that stored proc:
        REPLACE(REPLACE(REPLACE(ct.[output],’\\’+@@SERVERNAME+’\’,”),’ :’,”),’sqlserver:’,”)[CounterName]
        It only strips out the sql server name from the leading part of the counter name if the sql server is a default instance. That sql code doesn’t cater for named instance where the leading part is of the form \\myserver\mssql$myinstance:. So you end up with entries like this in the PerformanceCounter table:
        \\myserver\mssql$myinstance:buffer manager\buffer cache hit ratio

        I’ve replaced that one line with this to overcome the issue:

        REPLACE(
        CASE WHEN LOWER(ct.[output]) LIKE ‘%\mssql$%’ OR LOWER(ct.[output]) LIKE ‘%sqlserver:%’
        THEN SUBSTRING(ct.[output],CHARINDEX(‘:’,ct.[output])+1,500)
        ELSE SUBSTRING(ct.[output],LEN(@@SERVERNAME)-CHARINDEX(‘\’,REVERSE(@@SERVERNAME))+4,500)
        END,
        ‘ :’,”
        ) [CounterName]

        Hoping someone can fix this in the github code repository.

        1. Pete says:

          Nice one. Your solution is more elegant than mine.

      2. londondba says:

        Thanks very much for making this solution available to the SQL Server community. While testing it I like what I see. A couple of enhancements to the SQL Performance Baselining Reports would be good:
        1) I’m UK based so my computer regional settings show dates in dd/MM/YYYY format. I notice that almost all of your PerfCtr data sets present the datesampled on the charts using CONVERT(varchar(20),DateSampled,101) as ‘Date’ ie US format MM/dd/YYYY. That’s no real problem just presenting them a chart. But the Daily Performance Trend has an OnAction for the Available Memory Trend chart to go to the SQL Memory Utilization Report. The Date in US format is passed to this report which immediately errors with an incorrect date format error. Similarly the OnAction to go to the User Sessions Details report from the Daily Performance Workload Trend report.
        Also, the language for all your reports is harcoded to be en-US so with these issues I’ve resorted to setting my workstation region to US.
        Would be really good to eliminate these date format issues from these SSRS reports.
        2) the Database Storage report doesn’t take into account mount points. One of its data sets uses xp_fixeddrives which completely ignored mount points. Would be really good to address this issue. The report would be very useful then.

        Thanks again for the nice addition to the toolbox !

  8. Arpit_Mehra says:

    Hello,

    Report is empty. I am missing something? Please advise.

    1. Arpit, if the SQL Agent jobs are not running and populating data, you will see empty reports. You should check with Data collection is setup accurately.

  9. Lex says:

    Hi, the job DBA – PerfMon Counter Collection failed with the following error –> Arithmetic overflow error converting expression to data type money. Can u help me to fix it?. THX!

    1. Lex says:

      I found the error, i must replace the comma (,) for the point (.)

  10. Lex says:

    Hi again!, when i click on the chart for any specific data, throw the following error –>
    The value provided for the ‘Date’ parameter is not valid for this type. (rsReportParameterTypeMismatch) . Can u help me with this? THX!

    1. Lex says:

      I found the solution, i changed the type date for “text” .

  11. Lex says:

    Hi, i found that sometimes the numbers of perfmon counters have an “E-05” in the characters this significant that this number is exponent 10 (-5), so have to divide the number 100000 to obtain the real value for this counter.

    1. Dirk Hondong says:

      Hi Lex,

      did you already implement a Workaround for this issue? Otherwise, if you’re on GitHub you may participate here:
      https://github.com/Microsoft/tigertoolbox/issues/41

  12. George says:

    Hello. thanx a lot for sharing.
    All the jobs run succesfully, but The “Processor Utilization Trend” inside “SQL Performance Dashboard” report, shows “Overflow error”. How can I fix this?

    1. Dirk Hondong says:

      Hi George,
      please check if you store your perfounter values with a comma or a point as decimal place. I think you run into a round error in one of the reports.
      I ran into a similar problem in the past. I’ve made sure that I store perfounter values with a point instead of a comma.

      1. George says:

        Hi Dirk. Yes, that was the case. Thanks a lot!

  13. AndySugs says:

    When I run CREATEOBJECTS.sql I am getting following error:

    Msg 119, Level 15, State 1, Line 517
    Must pass parameter number 2 and subsequent parameters as ‘@name = value’. After the form ‘@name = value’ has been used, all subsequent parameters must be passed in the form ‘@name = value’.

    However, dbo.sp_executesql only has one parameter (@statement) so I’m perplexed as to why getting this error

    1. Dirk Hondong says:

      Hi Andy,
      I ran into the same Problem.
      I have already opened an issue on github and provided an solution as well
      https://github.com/Microsoft/tigertoolbox/issues/39

    2. AndySugs says:

      CREATEOBJECTS.sql is also missing an ‘end’ in the ELSE part of the IF ELSE statement for the spGetPerfCountersFromPowerShell procedure. It should go right before the other ‘END’ at the very bottom of the script.

      1. AndySugs says:

        Dirk has kindly updated CREATEOBJECTS.sql so that pass parameter number error and missing ‘end’ issues resolved. Check ‘Error in CREATEOBJECTS.sql’ in Issues tab on Github:
        https://github.com/Microsoft/tigertoolbox/issues/39

        1. Dirk Hondong says:

          My pull request was merged today.
          So you can feel free to download the original repository 🙂

          Regards
          Dirk

  14. AndySugs says:

    I’ve also had issues connecting to the sub-reports from Performance Dashboard Main. I’d gone in to every report and set the credentials but I was still getting following error:

    An error has occurred during report processing. (rsProcessingAborted)
    Cannot create a connection to data source ‘DataSource1’. (rsErrorOpeningConnection)
    Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’.

    Strangely enough, even though I updated the credentials in every report in Data Tools and then re-deployed the whole solution when I opened a subreport which was failing in Report Builder the data-source was still showing as Windows Auth (integrated security).

    To resolve this issue I had to go into every report in Data Tools and change the name of the data-source from DataSource1. As soon as I renamed it in a sub-report and re-deployed then I could access it fine from the dashboard.

  15. Tom says:

    Is there a version of the script available without enabling xp_cmdshell? With the frequency this script runs it is not realistic to enable/disable this feature.

    1. To avoid xp_cmdshell, you can modify the jobs to call powershell scripts directly from the SQL Agent job. Following is the command you can replace in Get PerfMon data job

      C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe C:\Scripts\Get-SQLPerfCounters.ps1 -S $(ESCAPE_NONE(SRVR))

      1. I’d suggest making the PerfMon collection job use PowerShell by default since xp_cmdshell is disabled by default. Made the suggested change to the job and it works. Thank you.

        1. Thanks Ron. Yes, it was the default in my original code but later we merged the change from SQLAdrain in GitHub. Will try to make it default.

  16. AndySugs says:

    Hi Parikshit,

    Do we need to add the action links to the sub-reports ourselves? I presumed that all reports in the suite could be accessed from the SQL Performance Dashboard but this doesn’t seem to be the case so I’m a little confused by how the reports are supposed to link.

    In the Setting up & Deploying Reporting section above you state that from the dashboard you can access the Daily Performance Trend report and the screenshot below shows several graphs, one of which is the Active Sessions Request Trend. However, this graph is not in the Daily Performance Trend report but rather the Daily Performance Workload Trend and I can’t seem to access that report via the dashboard. Should the Daily Performance Workload Trend report be accessed via Action link in the User Workload Trend graph in the Daily Performance Trend report? I presume this should be the case but can’t see a link unless I’m missing something.

    As far as I can make out the only report in the suite which can be accesses via the dashboard is the Daily Performance Trend. Then from this, via the Available Memory Trend graph, you can access the SQL Memory Utilisation report. There don’t seem to be any other action links in these 3 reports so I’m unsure how you can access:
    * Database Storage report
    * User Session Details
    * Index Frag report
    * Daily_Performance_Workload_Trend

    I’ve looked on GitHub and no one else seems to have had this issue so not sure if I’m missing something.

    1. Hi Andy,

      The actions are already set in SQL Performance Dashboard report. When you click on the chart area of each individual chart, it would take you to daily performance trend report for that specific date. You can further click on the chart area of Memory performance counter report and it will take you to Memory report.

      1. AndySugs says:

        I’m still confused I’m afraid. How do we access the reports I listed i.e.
        * User Session Details
        * Index Frag report
        * Daily_Performance_Workload_Trend
        None of these are in the Performance suite but cannot be accessed from the dashboard in the baseline suite.

        1. AndySugs says:

          And why are the graphs in the Daily Performance trend report duplicated in the Daily Performance Workload trend report, which also has Active session request trend and Wait Statistics graphs? Can we not just get rid of Daily Performance trend report?

          1. Andy,

            Daily Performance Workload trend report has additional data on the session waits which you can correlate with other performance metrics (CPU, memory. IO, tempdb) but it relies on the job Load Session status per minute. However, if for any reason, you cannot run that job due to overhead of running it every 1 min, I still wanted performance metrics to be reported which is the reason why i choose to duplicate the charts. Its easier to correlate data and make analysis easy.

            In short, if you are running the job Load Session status per minutes, use Daily Performance Workload trend report else use Daily Performance trend report

        2. Andy,

          The reports listed by you are not linked in dashboard. You need to access the reports directly by clicking on it from Report Manager and provide the parameter values.

          User Session details is accessed from Daily_Performance_Workload_Trend.
          I explained Daily_Performance_Workload_Trend report in your latest comment.
          Index defrag report may not belong to baseline suite but I still have it if needed.

          1. AndySugs says:

            OK thanks Parikshit. I just presumed that, as with the Performance Dashboard suite, all the reports would be accessed via the main dashboard report.

  17. Dirk Hondong says:

    Hi there,

    where do we find the wait statistics report?
    I cannot reach it through the dashboard

    Regards
    Dirk

    1. Dirk,

      I think you might be confused with SQL Performance Dashboard report https://blogs.msdn.microsoft.com/sql_server_team/sql-server-performance-dashboard-reports-unleashed-for-enterprise-monitoring/ which has wait stats report viz dashboard.

      For SQL Performance Baseline reports, there is a job Load Session status per minute which captures waits snapshots from sys.dm_exec_requests. The data is reported by Daily performance workload trend report which can be further drill down to User session details report.

      1. Dirk Hondong says:

        I just figured it out in the last couple of minutes. Trying to find a smooth workaround for some date conversion regarding en-US format on non en-US systems.
        Sooner or later ther will be another pull request on GitHub I suppose 😉

        Regards
        Dirk

        1. Will be happy to see the pull requests 🙂 . Thanks for all your community contributions. !!!

        2. AndySugs says:

          Hi Dirk. I’ve added SET DATEFORMAT dmy; at the top of respective queries and updated convert style from 101 to 103

          1. AndySugs says:

            And on user_session_details report I set time parameter value to =RIGHT(TimeOfDay(),8)

  18. Tom says:

    I have an issue accessing the drilldown reports, if I click on any of the reports to drill down I get the following error:
    The value provided for the report parameter ‘Date’ is not valid for its type. (rsReportParameterTypeMismatch)

    1. Tom,

      I would suggest you download the SSRS project again from GitHub and re-deploy the reports. The above is not expected and the reports should work.

      1. Tom says:

        Hi,
        I re-downloaded the reports but unfortunately the issue persists. I wonder if the issue is related to the date being in US format when this has been deployed in the UK?

        1. Tom says:

          To add I have changed my workstation to English US format and the OnAction reports are now working. This is not a feasible solution long term. How are these reports hard-coded for US format only?

          1. Yes, Tom you are right. Currently it supports only English US format. May be someone from community can work on the effort to add localization support.

            1. Tom says:

              Hi – I was able to get the reports working with UK regional settings. I do however noticed in the article above 6 charts are shown in the ‘Daily Performance Trend’ report but I have 4 chart? Active Session Request Trend and Wait Statistics charts are missing. Can you upload the latest report with all 6 charts?

              1. Tom,

                You would see 6 charts in Daily Performance Workload Trend Report. Daily Performance Workload trend report has additional data on the session waits which you can correlate with other performance metrics (CPU, memory. IO, tempdb) but it relies on the job Load Session status per minute. However, if for any reason, you cannot run that job due to overhead of running it every 1 min, I still wanted performance metrics to be reported which is the reason why i choose to duplicate the charts. Its easier to correlate data and make analysis easy.
                In short, if you are running the job Load Session status per minutes, use Daily Performance Workload trend report else use Daily Performance trend report.

  19. AndySugs says:

    Has anyone set up the XEvents script/reports? I’ve had a go today but am a bit confused (again! It’s a regular occurrence!) In the script it mentions in comments setting up database which I presumed should be XEvents_ImportSystemHealth. Then the script starts with Use dba_local. I successfully ran the Load SystemHealthSession SQL job but when tried running Dashboard report it errored with “Query execution failed for dataset Timeline. Invalid object name ‘Tbl_Summary’.” The datasource for Timeline dataset is the server/database selected as parameters. Would be grateful if Denzil (or anyone else who’s worked it out!) could give a little more info on how this all should work

    1. Andy,

      To avoid creating multiple databases, I have tried to consolidate all the data in a single database viz dba_local. Have you ran the script CreateSystemhealthDBAndSchema.sql which creates the underlying schema. Following schema creation you can run Load SystemHealthSession SQL job to populate the tables.

      1. AndySugs says:

        Yeah I’ve run that script and set everything up in dba_local but when I went to run the dashboard report in the XEvents suite the 2nd parameter listed all the databases on respective server. Should it be changed to just dba_local?

        1. Understood. I will look into it and change the reports.

        2. AndySugs says:

          Also, I’ve tried linking SQLServerName parameter to data-set that I’ve created which is simply a lookup of all the respective SQL Servers (in exactly the same way as DatabaseName has been linked to DatabaseList data-set) but when try and run report I get error stating “An error has occurred during report processing. Error during processing of the ConnectString expression of data source ‘XEvent’.” I’m not sure why I’m getting this error as I set it up exactly the same in the performance dashboard and it worked perfectly. Any ideas?

          1. AndySugs says:

            I’ve changed ServerName data-source so that instead of going off server name entered at run time it is linked to the central SQL Server database which in my case is the report server. On the dba_local database on this server I’ve created a table which lists all SQL Servers and have created server lookup data-set in Dashboard report so that user can now just select from this list for Server parameter. So the XEvent data-source which previously had the expression:

            =”Data Source=” & Parameters!SQLServerName.Value & “;Initial Catalog=” & Parameters!DatabaseName.Value

            now looks like:

            =Data Source=[Report Server];Initial Catalog=dba_local

            1. Andy,

              I have modified the System Health Session Reports to avoid database name in the reports. Refresh the reports from GitHub repo and you will see the latest report with the changes.

              Regards,
              Parikshit

  20. Leon says:

    Great solution, thank you!
    Is there any intention to add query statistics?
    Graphs are useful, but I think you would like to relate your peaks to the corresponding SQL statements.

    1. Thanks Leon . Great feedback and I agree with your point. It is now a community solution as we have placed it on GitHub and as time permits, we will continue to update it and contribute to it.

  21. Ravinder singh says:

    Hi Parikshit !
    Hope doing Good. Performance dashboard report deployed successfully but no Server instance name reflects in Server list. using SQL Server 2012 on Win 2012 R2. Can you please help me to get populate Server names . I have already run. sql script on 2 -3 remote instances. Registered in CMS bust still all in vain.
    Thanks in advance!!

    1. Hi Ravinder, YOu will need to enter the SQL Server instance name in the parameter box for SQL Server instance. it wont autopopulate. Once you enter the SQL instance name ( or \ or ,), you should see the reports populated provided the scripts are appropriately ran against the remote instances and the SQL Agents jobs are running fine and populating data.

  22. Rafael says:

    When I used the reports on named instances processor trend and workload trend are blanks.
    Any idea what are the changes I need to do for named instances?
    For default instances it works just fine.

    1. Rafael,

      if you use powershell scripts for data collection on the named instance, it should avoid that issue. In GitHub, there was a pull request to convert all data collection code into tsql which has a bug for named instance but i believe it was fixed with the latest pull request. Have you downloaded the latest solution from GitHub?

      1. Rafael says:

        Hi Parikshit,

        I haven’t downloaded yet the latest solutions in Github.
        I’ll try it then.
        Thanks a lot.

Skip to main content