What is plan regression in SQL Server?

Plan regression happens when SQL Server starts using the sub-optimal SQL plan to execute some T-SQL query. Usually you will see that some T-SQL query is executing really fast, but then it gets slower without any obvious reason. In this post you will see how can plan regression happen.

Setup

First I’m going to create a simple table that will be used for the test:

drop table if exists flgp; 
create table flgp (
       type int,
       name nvarchar(200),
       index ncci nonclustered columnstore (type),
       index ix_type(type)
);

This table has two columns, one classic B-tree index, and one COLUMNSTORE index. B-tree index is better for the queries that need a limited set of rows, while the columnstore index is better for queries that will scan most of the rows in the table.

I will populate this table with non-uniformly distributed data:

  • One row that has type = 1
  • 999.999 rows that have type = 2
insert into flgp(type, name)
values (1, 'Single')

insert into flgp(type, name)
select TOP 999999 2 as type, o.name
from sys.objects, sys.all_columns o

Plan regression will be demonstrated on this data set.

Why we have different plans for the same T-SQL query?

Every T-SQL query can have different plans. Let’s look at the following query:

SELECT COUNT(*) FROM flgp WHERE type = @type

If we pass a parameter with the value 1, the query would need to take one row. Therefore, the optimal plan would be a plan that uses B-Tree index that seeks into the table, takes one row using the reference from the index and returns number 1 as a result. B-tree indexes are perfect for very selective seeks/lookups. Let’s look at this query:

ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 1

This query will produce plan with B-Tree INDEX SEEK that is optimal for parameter 1. with the following execution statistics:

SQL Server Execution Times:
 CPU time = 0 ms, elapsed time = 7 ms.

If we pass parameter value 2, the query would need to read almost the whole table. The optimal plan would be the plan that uses columnstore index that scans the table, takes all rows from the index and counts them. Let’s try to execute this query:

ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;
EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 2

Execution statistics for this query might be:

 SQL Server Execution Times:
 CPU time = 312 ms, elapsed time = 400 ms.

Depending on the parameter value, SQL Server will choose one or another plan. The second plan is slower, but this is expected because it needs to count million of rows.

In both cases I have cleared the cache before I executed the query using ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE, because I want SQL Server to recompile the query and create the new plan.

How plan regression happens?

When SQL Server creates a plan for a query, the plan will be cached and reused when the same query comes again. The plan is retained in the cache and reused for some time.Now, let’s see what would happen if I execute the query that will put a plan with index seek into the cache, and then I execute second query that reuses that plan:

ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE; 
EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 1
-- this query will reuse plan cached from previous query:
EXECUTE sp_executesql @stmt = N'SELECT COUNT(*) FROM flgp WHERE type = @type', @params = N'@type int', @type = 2

If we look at the execution statistics for the second query we might see something like:

 SQL Server Execution Times:
 CPU time = 719 ms, elapsed time = 721 ms.

Both CPU and elapsed time for the second query is doubled (from 300-400ms to 700ms) because the second query used the plan that was optimal for the first query. The cause of regression is plan caching in this case.

 

Conclusion

Plan regression happens when SQL Server starts using sub-optimal plan, which increases CPU time and duration. One way to mitigate this is to recompile query with OPTION(RECOMPILE) if you find this problem. Do not clear procedure cache on production system because it will affect all queries!

Another option would be to use automatic plan choice correction in SQL Server 2017 that will look at the history of plans and force SQL Server to use last known good plan if plan regression is detected.