How true is it that Missing Indexes feature cannot gather stats for more than 500 missing index groups?

Not much, really.

There is a topic in SQL Server’s documentation about the Limitations of the Missing Indexes Feature. In this post, I want to discuss about the one limitation enumerated in that list, included in the referred topic, where it describes the theoretical existence of a maximum of 500 missing index groups.

The missing index feature has the following limitations:

  • It is not intended to fine tune an indexing configuration.

  • It cannot gather statistics for more than 500 missing index groups.

  • It does not specify an order for columns to be used in an index.

  • For queries involving only inequality predicates, it returns less accurate cost information.

  • It reports only include columns for some queries, so index key columns must be manually selected.

  • It returns only raw information about columns on which indexes might be missing.

  • It does not suggest filtered indexes.

  • It can return different costs for the same missing index group that appears multiple times in XML Showplans.

  • It does not consider trivial query plans.

Not only in that list but also in a subsequent section named “Maximum Number of Missing Index Groups” within the same topic, there is the following assertion:

Statistics are gathered for a maximum of 500 missing index groups. After this threshold is reached, no more missing index group data is gathered. This threshold is not a tunable parameter and cannot be changed.

That the threshold is not a tunable parameter and cannot be changed, that’s absolutely true. But… regarding the hardcoded maximum, it is not 500 but 600, to start with. So don’t become alarmed if you ever query either sys.dm_db_missing_index_group_stats or sys.dm_db_missing_index_groups and they return any number of rows above 500 and up to 600. However, it won’t typically occur. What is true though is that once there are 600 different index groups registered in that global list exposed via these two DMVs, SQL Server will not register any newer index suggestion until the list has been purged and there are less than 600 elements in it. Therefore, that 600 acts kind of like a hard limit.

The reason why I said it won’t typically occur that you will see more than 500 rows exposed through those two DMVs is because there is a timer-based task, which executes every 2 minutes, to check if that soft limit of 500 hundred has been exceeded (i.e. if there are more than 500 elements already registered in the list). If that’s the case, it will purge 20 percent of them. And the formula it will use to choose which ones are less valuable to keep around is “Total Improvement Costs/SQRT(Ages in hours)+1”. The query SQL Server runs to identify those missing index stats elements it must get rid of, looks like this:

select tttt.database_id, tttt.object_id, tttt.group_handle from ( select tt.database_id, tt.object_id, tt.group_handle, tt.index_handle, ntile(100) over (order by metrics) as nt from (Select mid.database_id, mid.object_id, mig.index_group_handle as group_handle, mig.index_handle as index_handle, migs_adv.index_advantage/(sqrt(migs_adv.ages)+1) as metrics from ( select (user_seeks + user_scans)* avg_total_user_cost * avg_user_impact as index_advantage, DATEDIFF(hour, case when (last_user_seek IS NULL and last_user_scan IS NULL) then convert (datetime, N'1/1/2000', 101) when (last_user_seek IS NULL and last_user_scan IS NOT NULL) then last_user_scan when (last_user_seek IS NOT NULL and last_user_scan IS NULL) then last_user_seek when (last_user_seek IS NOT NULL and last_user_scan IS NOT NULL) then CASE when (last_user_seek > last_user_scan) then last_user_seek else last_user_scan end end, getdate()) as ages, migs.* from sys.dm_db_missing_index_group_stats migs ) as migs_adv, sys.dm_db_missing_index_groups mig, sys.dm_db_missing_index_details mid where migs_adv.group_handle = mig.index_group_handle and mig.index_handle = mid.index_handle) as tt) as tttt where nt <= 20

The following script would allow you easily exceed the 500 hundred soft limit and eventually see 600 rows through the missing indexes DMVs. While you run the script below from one session, run select count(*) from sys.dm_db_missing_index_groups from a different session and observe how many rows are counted.

if exists(select * from sys.tables where name = 'table0') drop table table0;
go
create table table0 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int, c9 int, c10 int, c11 char(400))
go
set nocount on
declare @r int
set @r = 0
while (@r <= 600)
begin
insert into table0 values (1, 1, 1, 1, 1, 1, 1, 1, 1, 1, '')
set @r += 1
end
insert into table0 values (2, 2, 2, 2, 2, 2, 2, 2, 2, 2, '')
go
declare @i int
set @i = 1
while (@i <= 100)
begin
execute ('if exists(select * from sys.tables where name = ''table' + @i + ''') drop table table' + @i + ';');
execute('create table table' + @i + ' (c1 int, c2 int, c3 int, c4 int, c5 int, c6 int, c7 int, c8 int, c9 int, c10 int, c11 char(400))');
execute('create nonclustered index NCL_table' + @i + ' on table' + @i + ' (c11)');
execute('set nocount on; insert into table' + @i + ' select * from table0');
execute('declare @a int;select @a = c4 from table' + @i + ' where c1 = 2;select @a = c4 from table' + @i + ' where c2 = 2;');
execute('declare @a int;select @a = c4 from table' + @i + ' where c3 = 2;select @a = c4 from table' + @i + ' where c4 = 2;');
execute('declare @a int;select @a = c4 from table' + @i + ' where c5 = 2;select @a = c4 from table' + @i + ' where c6 = 2;');
execute('declare @a int;select @a = c4 from table' + @i + ' where c7 = 2;select @a = c4 from table' + @i + ' where c8 = 2;');
execute('declare @a int;select @a = c4 from table' + @i + ' where c9 = 2;select @a = c4 from table' + @i + ' where c10 = 2;');
set @i += 1;
end