The Joys of Filtered Indexes


A filtered index, introduced in SQL Server 2008, is a non-clustered index that contains only a subset of the number of rows contained in a table.  Because there is a 1:1 ratio between the number of rows in a table and the number of rows in a regular non-clustered index, a filtered index can have performance and storage advantages.  These indexes are typically much smaller than the full-table non clustered indexes.  The statistics on these filtered indexes are filtered as well, which typically results in them being more accurate. 

Filtered indexes are well-suited for scenarios where you frequently query a well-defined subset of data.  This often includes querying ranges of specific values such as dates or monetary ranges, along with scenarios where you frequently have to filter out NULL values – such as Sparse columns or minimally updated columns.  Another scenario where filtered indexes work really nicely is when you’re dealing with heterogeneous data, such as frequent queries on highly correlated columns in a table for certain subsets of data.   In this case, because of the frequency that you deal with the relationship between columns for a given data set, it may make sense to create a specific filtered index to meet your query needs rather than using a non-clustered index that contains the same number of rows as the base table.

Because the filtered index only needs to be maintained when there are rows inserted/updated/deleted that meet the index and filter definition, the overhead of maintaining the filtered index can be significantly less.  There are also many cases where the Query Optimizer is more likely to choose a lower-cost plan based on the presence of the filtered index.

Let’s take a look at a filtered index in action.  First, I’ll create a new table named Sales.NewSales, which is a copy of the AdventureWorks table Sales.SalesOrderDetail.

USE AdventureWorks

GO

IF OBJECT_ID(‘Sales.NewSales’) IS NOT NULL
DROP TABLE Sales.NewSales

GO

SELECT *
INTO Sales.NewSales
FROM Sales.SalesOrderDetail
GO

Create a new column in the Sales.NewsSales table named NewUnitPrice.

ALTER TABLE Sales.NewSales
ADD NewUnitPrice MONEY NULL

GO

Update every 10th record in Sales.NewSales to have a non-NULL value.

UPDATE Sales.NewSales
SET NewUnitPrice =
CASE

    WHEN SalesOrderDetailID % 10 = 1
    THEN UnitPrice
    ELSE NULL
END

Create a filtered index on the NewUnitPrice column to include any value that is NOT NULL.

CREATE NONCLUSTERED INDEX NewSalesIDX
ON Sales.NewSales(NewUnitPrice)
WHERE(NewUnitPrice IS NOT NULL)

Create an index on the UnitPrice field for comparison purposes.

CREATE NONCLUSTERED INDEX UnitPriceIDX
ON Sales.NewSales(UnitPrice)

Now, let’s compare the number of pages between the two indexes.

SELECT
    i.name, d.index_type_desc,
    d.index_depth, d.avg_fragmentation_in_percent,
    d.page_count,
    IndexSizeMB = (page_count * 8192)/1024.0/1024.0
FROM
sys.dm_db_index_physical_stats
    ( DB_ID(), 
      OBJECT_ID(‘Sales.NewSales’), 
      NULL,NULL,
NULL

    ) d

JOIN
sys.indexes i ON
d.index_id = i.index_id AND
d.object_id = i.object_id
WHERE

i.index_id > 1

From the screenshot below, you can see that the filtered index has many fewer pages – which also means that it takes less disk space.

So, how does performance compare between the two indexes?  Running the following query you can see that the filtered index out performs the regular non-clustered index by a significant margin.

SELECT NewUnitPrice
FROM Sales.NewSales
WHERE NewUnitPrice =
49.99

SELECT UnitPrice
FROM Sales.NewSales
WHERE UnitPrice =
49.99

For more information on the downside of filtered indexes, check out my post here:  http://blogs.msdn.com/b/timchapman/archive/2012/08/27/the-drawback-of-using-filtered-indexes.aspx

 

HTH,

Tim

 

Comments (5)

  1. Brett Phipps says:

    This is really interesting but I'm not clear how it works with dates.  Can you provide an example of how it would work with dates?

  2. timchapman says:

    Sure.  One example is if you have a date in a table that you know that you search on a LOT.  You can specify date ranges in the Index definition predicate:

    CREATE INDEX FIX_SalesOrderDetail_ModifiedDate

    ON Sales.SalesOrderDetail(ModifiedDate)

    WHERE ModifiedDate >= '08/01/2007' AND ModifiedDate < '08/02/2007'

    SELECT ModifiedDate

    FROM Sales.SalesOrderDetail

    WHERE ModifiedDate >= '08/01/2007' AND ModifiedDate < '08/02/2007'

  3. John Neville says:

    Thanks for a great post!

    I'm trying to think how I could make use of this where I work.  The reports we run use the last 60days of data heavily, but then only report annual figures after that.

    Is it possible to have a rolling 60-day predicate that updates itself dynamically?  Or would I need to create an Agent job to drop and re-create the index?

    I can see a lot of uses for filtered indexes 🙂

    Thanks again!

  4. timchapman says:

    Hi John,

    Interesting question.  We can't use a dynamic predicate in the filter.  The reason is that the filtered index is a NC index, which contains a subset of rows in the main table.  This can't be modified in such a way dynamically. So, you can't do something like this:

    WHERE ModifiedDate >= GETDATE() – 60  

    So, the technique would need to be to drop the existing filtered index after the data load (or each night, etc) and change the definition.

    HTH,

    Tim

  5. timchapman says:

    Also, make sure you check out my post here on the downsides of using filtered indexes:  blogs.msdn.com/…/the-drawback-of-using-filtered-indexes.aspx