In the upcoming Beta 3 of DynamicsPerf 2.0, we have implemented a new feature called QUERY ALERTS.
The inspiration for the design has come from all of you. One of the big challenges that we all face with performance is that sudden slowdown of the system. In general, this happens for 2 major reasons, database blocking or parameter sniffing. Everyone has a solution for monitoring database blocking including DynamicsPerf with Extended Events in version 2.0.
The other issue, parameter sniffing, is a bit more of a challenge to find and fix. You can read more about parameter sniffing from my colleague Michael Devoe here: http://blogs.msdn.com/b/axinthefield/archive/2014/01/09/sql-server-parameter-sniffing-with-dynamics-ax-just-plain-evil.aspx.
As Michael notes in this blog there are several reasons for the issue. Despite our best efforts of trying to deal with it from the application side, its impossible to completely deal with all the statistical variations in our data.
So, how do you find it in order to be able to fix it? In version 1.20 of DynamicsPerf, we added in the ability to do BASELINE captures. What this gave you the ability to do was, when the system experienced one of those sudden slowdowns, you could capturestats and then compare that data to one of the baselines that you had previously captured looking for queries where the avg_time was greater then the avg_time in the baseline. This worked pretty well with some caveats to it. The first was you had to manually go capture data at the time of the problem. We are all busy and it might be 20-30 minutes or more before the users notify IT that there is a problem. The second challenge is the amount of data that you may have to filter through in order find that one query. If you were capturing stats once per day at 5pm, the default, if the problem occurred the next morning at 10am, when you run capturestats because of the problem, you have all the queries that have run since the previous day. It means you have a lot of data to filter through looking for the query that is your problem now. The process worked but it was a lot of work/time for you to find that ONE query.
So, how do we improve on that? In DynamicsPerf 2.0, we are collecting query data every 5 minutes. So, this solves the need to remember to manually capturestats. Great, I don’t have to remember to collect any more but how does that help me find that ONE query. The answer to this question is how we aggregate data into the history tables. http://blogs.msdn.com/b/axinthefield/archive/2016/01/04/dynamicsperf-2-0-historical-tables-and-baselines.aspx
Now that we have all this historical data for the day and for the month by query, we can use that to help find that ONE query. This is the idea behind the Query Alerts functionality. We now have a new task that runs every 5 minutes and looks at JUST the queries that ran in the last 5 minutes and compares the avg_time for those queries against the avg_time for the DAY and the avg_time for the MONTH for that query. If it is X% greater then we write the data into a table called QUERY_ALERTS in the DynamicsPerf database.
How do you configure it? Starting in Beta 3, script 4-ConfigureDBs to Collect.sql has an additional script for populating the QUERY_ALERTS_CONFIG table. There are 4 configuration values:
This is the percent change when comparing the QUERY_STAT avg_time to the QUERY_HISTORY avg_time for the DAY record
This is the percent change when comparing the QUERY_STAT avg_time to the QUERY_HISTORY avg_time for the MONTH record
This is the minimum number of executions in a 5 minute time period for the query to be considered
This is the minimum avg_time for a query to be considered
How do you look at the QUERY_ALERTS? In the DynamicsPerf 2.0 Analyze solution there is a new script called 1-Analyze_Query_Alerts.sql. There are several queries, including show you the alerts by time Desc, the number of alerts per query_hash, by day and by hour.
The percent change values in the QUERY_ALERTS_CONFIG table work as follows:
150 = 150% a query that avg’s 1 sec would have to take 1.5secs to fire the alert
200 = 200% a query that avg’s 1 sec would have to take 2secs to fire the alert
After configuring this feature for your system, watch the amount of alerts you are getting and adjust the configure options as necessary.
REQUIREMENT NOTE: DynamicsPerf 2.0 MUST be installed on SQL Server 2012 or above for this feature to work. We use the LAG command in the view that is the key to this data which was a new TSQL command starting with SQL Server 2012. If your Dynamics application is running on SQL Server 2008 R2, do a remote installation of DynamicsPerf 2.0 on a SQL 2012 or above server and collect data remotely.
Once you have tried out this new feature be sure to provide us feedback on how it worked for you in:
And because I know the question is going to be asked, the answer is “SOON”.