Do Filtered Statistics update as frequently as normal statistics?

I received a question this week about whether filtered statistics update as frequently as regular statistics.  The right way to ask the question is “Do filtered statistics become invalid as frequently as regular statistics?”, as stats are recomputed based on the queries that consume them, while stats are invalidated based on changes to the underlying data.

The basic mechanism for stats invalidation is a per-column counter in the storage engine.  It’s not transacted, and it is incremented on each INSERT/UPDATE/DELETE/MERGE.  When a normal statistics object is created, the current “modcounter” for that column is stored in the statistics object.  As changes are made to the table, the difference between the current modcounter and the one stored in the statistics object becomes greater.  Ultimately, when the difference is large enough (I will slightly oversimplify and say that it is 20% of the table changed for this blog post), the statistics object is invalidated.  When a query consuming the statistic is executed again, it will determine that the statistic is out-of-date and recompile it.  Once done, the query is recompiled using the new statistics object.  Unfortunately, many of the details of this logic are not exposed, and this can make it somewhat difficult to understand the mechanism.

 

Filtered statistics do not change the fundamental mechanism – there is still one set of counters in the storage engine.  The thresholds for when filtered statistics become invalid, however, are modified from the normal case based on the estimated selectivity of the filter.  So, a filter that only covers 20% of the domain is updated 1/5th as frequently.  The calculation is done without knowledge to whether the changes being made to the table would actually cause the statistic object to be changed.

I’ve written up an example to show that filtered stats do indeed get updated even when all of the changes to the data are outside of the range of the filter on the statistics object.  In this example, the filter selectivity is very high (~90%), and the “data change” is effectively not really changing the data, but I know that it is impacting the modcounter for the column.

 

    1:  use tempdb
    2:   
    3:  --drop table testfilteredstatsrefresh
    4:  -- create a table for our example with values 0...999 in col2
    5:  create table testfilteredstatsrefresh(col1 int identity, col2 int)
    6:  go
    7:  set nocount on
    8:  begin transaction
    9:  declare @a int=0
   10:  while @a < 20000
   11:  begin
   12:  insert into testfilteredstatsrefresh(col2) values (@a % 1000)
   13:  set @a+=1
   14:  end
   15:  commit transaction
   16:  go
   17:  -- create filtered stats over a portion of the domain
   18:  create statistics i1 on testfilteredstatsrefresh(col2) WHERE col2 > 100
   19:  go
   20:  select * from testfilteredstatsrefresh where col2 <> 55 option(recompile)
   21:  go
   22:  set nocount off
   23:  -- update the table with a series of no-ops on the section of the range
   24:  -- not covered by the filtered stats
   25:  update testfilteredstatsrefresh set col2=col2 %1000 where col2<100
   26:  update testfilteredstatsrefresh set col2=col2 %1000 where col2<100
   27:  update testfilteredstatsrefresh set col2=col2 %1000 where col2<100
   28:  update testfilteredstatsrefresh set col2=col2 %1000 where col2<100
   29:  update testfilteredstatsrefresh set col2=col2 %1000 where col2<100
   30:  go
   31:  -- run the query again - auto-stats fires again
   32:  -- so, 
   33:  select * from testfilteredstatsrefresh where col2 <> 55 option(recompile)
   34:   

 

I turned on sql profiler and captured the “auto stats” event to watch when it is recompiled.  You will notice that this happens on the last query (after the changes), even though there have been no changes in the part of the domain covered by the filtered statistics object.

 

Happy Querying!

 

Conor