Demo: Identify and fix plan change regression in SQL Server 2017 RC1

Plan change regression happens when SQL Database changes a plan for some T-SQL query, and the new plan has the worse performance than the previous one. SQL Server 2017 has Automatic Tuning feature that enables you to easily find plan change regressions and fix them. In this post you will see the demo script that you can use to cause plan change regression and manually fix it using new sys.dm_db_tuning_recommendations view.

If you are not familiar with plan regressions and new tuning recommendations in SQL Server 2017, I would recommend to read these two posts:

This would be enough to understand steps in this demo.

Setup

First we would need a table where we will execute a query that will cause plan change regression. In the previous post (What is plan regression in SQL Server?), I have created a demo that fills a table with data and cause plan regression. I will use the same table in this demo:

 drop table if exists flgp; 
create table flgp (
       type int,
       name nvarchar(200),
       index ncci nonclustered columnstore (type),
       index ix_type(type)
);
insert into flgp(type, name)
values (1, 'Single');
insert into flgp(type, name)
select TOP 999999 2 as type, o.name
from sys.objects, sys.all_columns o;

Once you create this table you can use the following scripts to cause the plan change regressions.

Plan choice regression

First, I will fill the history of plan executions by executing the query 30 times. This step is required because we need to have information about the good plan in Query Store:

 ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE; 
EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 2
GO 30

If you look at the execution plan for this query, you will see Hash aggregate operator.

NOTE: If you see Stream aggregate instead of Hash aggregate in the plan, for some reason SQL Server don't thinks that 9999999 rows with type=2 is enough for this plan. In that case, add more rows with type=2 into the flgp table by repeating the following query several times:

 insert into flgp(type, name) select 2 as type, o.name from sys.objects, sys.all_columns o

Now I'm clearing the procedure cache and sending the same query but with a parameter that will touch only one row:

 ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE; 
EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 1

In this case, SQL Database will create a plan with Stream Aggregate that will be optimal for the small number of rows. This plan will be cached in SQL Server procedure cache and used in subsequent executions of the query.

Now, if I execute the query again with @type=2, SQL Server will use cached plan with Stream Aggregate instead of the optimal plan with Hash Aggregate:

 EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 2
GO 15

SQL Server needs at least 15 executions to collect enough information to compare current and previous plan.

Identifying tuning recommendation

You can open Query Store/Top Resource Consuming Queries in SSMS to find the query and plans that regressed. If you don't know what plan regressed, it would be easier to use new sys.dm_db_tuning_recommendations view that shows queries that regressed and plans that might be used instead of the current plans. You can use the following query to take information from this view:

 SELECT reason, score,
 script = JSON_VALUE(details, '$.implementationDetails.script'),
 planForceDetails.[query_id],
 planForceDetails.[new plan_id],
 planForceDetails.[recommended plan_id],
 estimated_gain = (regressedPlanExecutionCount+recommendedPlanExecutionCount)*(regressedPlanCpuTimeAverage-recommendedPlanCpuTimeAverage)/1000000,
 error_prone = IIF(regressedPlanErrorCount>recommendedPlanErrorCount, 'YES','NO')
 FROM sys.dm_db_tuning_recommendations
     CROSS APPLY OPENJSON (Details, '$.planForceDetails')
                 WITH ( [query_id] int '$.queryId',
                        [new plan_id] int '$.regressedPlanId',
                        [recommended plan_id] int '$.recommendedPlanId',
                        regressedPlanErrorCount int,
                        recommendedPlanErrorCount int,
                        regressedPlanExecutionCount int,
                        regressedPlanCpuTimeAverage float,
                        recommendedPlanExecutionCount int,
                        recommendedPlanCpuTimeAverage float ) as planForceDetails;

This query will return information about the queries and plans that regressed and T-SQL script that you can use to fix the issue.

reason score script query_id new plan_id recommended plan_id estimated_gain error_prone
Average query CPU time changed from 124.38ms to 2035.61ms 43 exec sp_query_store_force_plan @query_id = 2, @plan_id = 2 2 14 2 118.496206423913 NO

If you execute the script in the script column, previous good plan will be forced and used instead of the regressed plan. In this case, the following script will fix the regression and force the good plan:

 exec sp_query_store_force_plan @query_id = 2, @plan_id = 2

If you execute query again, you would see that it runs faster.

Automatic plan correction

As an alternative for manual monitoring and correction using sp_query_store_force_plan procedure, you can let SQL Server to automatically apply recommendations whenever big performance regression happens after the plan change using the following code:

 ALTER DATABASE current
      SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = ON ); 

SQL Server will take recommendations from the sys.dm_db_tuning_recommendations view, apply them, and automatically verifies that forced plan is better than the previous one.