Seek Predicates

Before SQL Server can perform an index seek, it must determine whether the keys of the index are suitable for evaluating a predicate in the query.

Single column indexes

Single column indexes are fairly straightforward.  SQL Server can use single column indexes to answer most simple comparisons including equality and inequality (greater than, less than, etc.) comparisons.  More complex expressions such as functions over a column and “like” predicates with a leading wildcard character will generally prevent SQL Server from using an index seek.

For example, suppose we have a single column index on a column “a.”  We can use this index to seek on these predicates:

  • a = 3.14
  • a > 100
  • a between 0 and 99
  • a like ‘abc%’
  • a in (2, 3, 5, 7)

However, we cannot use it to seek on these predicates:

  • ABS(a) = 1
  • a + 1 = 9
  • a like ‘%abc’

Multi-column indexes

Multi-column indexes are slightly more complex.  With a multi-column index, the order of the keys matters.  It determines the sort order of the index and it affects the set of seek predicates that SQL Server can evaluate using the index.

For an easy way to visualize why order matters, think about the phone book.  The phone book is like an index with the keys (last name, first name).  The contents of the phone book are sorted by last name and it is easy to look someone up if we know their last name.  However, if we have only a first name, it is very difficult to get a list of people with that name.  We would need another phone book sorted on first name.

In the same way, if we have an index on two columns, we can only use the index to satisfy a predicate on the second column if we have an equality predicate on the first column.  Even if we cannot use the index to satisfy the predicate on the second column, we may be able to use it on the first column.  In this case, we introduce a residual predicate for the predicate on the second column.  This is the same residual predicate that we use for scans.

For example, suppose we have a two column index on column “a” and “b.”  We can use this index to seek on any of the predicates that worked on the single column index.  We can also use it to seek on these additional predicates:

  • a = 3.14 and b = ‘pi’
  • a = ‘xyzzy’ and b <= 0

For the next set of examples, we can use the index to satisfy the predicate on column a, but not on column b.  In these cases, we need a residual predicate.

  • a > 100 and b > 100
  • a like ‘abc%’ and b = 2

Finally, we cannot use the index to seek on the next set of predicates as we cannot seek even on column a.  In these cases, we must use a different index (e.g., one where column b is the leading column) or we must use a scan with a residual predicate.

  • b = 0
  • a + 1 = 9 and b between 1 and 9
  • a like ‘%abc’ and b in (1, 3, 5)

A slightly more concrete example

Consider the following schema:

create table person (id int, last_name varchar(30), first_name varchar(30))
create unique clustered index person_id on person (id)
create index person_name on person (last_name, first_name)

Here are three queries and their corresponding text showplan.  The first query seeks on both columns of the person_name index.  The second query seeks on the first column only and uses a residual predicate to evaluate first_name.  The third query cannot seek and uses a scan with a residual predicate.

select id from person where last_name = 'Doe' and first_name = 'John'
|--Index Seek(OBJECT:([person].[person_name]), SEEK:([person].[last_name]='Doe' AND [person].[first_name]='John'))

select id from person where last_name > 'Doe' and first_name = 'John'
|--Index Seek(OBJECT:([person].[person_name]), SEEK:([person].[last_name] > 'Doe'), WHERE:([person].[first_name]='John'))

select id from person where last_name like '%oe' and first_name = 'John'
|--Index Scan(OBJECT:([person].[person_name]), WHERE:([person].[first_name]='John' AND [person].[last_name] like '%oe'))

Note:  In case you are trying to reproduce these plans, in this and some of my prior examples, I’ve used index hints (which I haven’t shown) to ensure that I get the plan I want to illustrate without having to insert data into the table.

A bit more about index keys

In most cases, the index keys are the set of columns that you specify in the create index statement.  However, when you create a non-unique non-clustered index on a table with a clustered index, we append the clustered index keys to the non-clustered index keys if they are not explicitly part of the non-clustered index keys.  You can seek on these implicit keys just as if you specified them explicitly.

For example, given this schema:

create table T_heap (a int, b int, c int, d int, e int, f int)
create index T_heap_a on T_heap (a)
create index T_heap_bc on T_heap (b, c)
create index T_heap_d on T_heap (d) include (e)
create unique index T_heap_f on T_heap (f)

 

create table T_clu (a int, b int, c int, d int, e int, f int)
create unique clustered index T_clu_a on T_clu (a)
create index T_clu_b on T_clu (b)
create index T_clu_ac on T_clu (a, c)
create index T_clu_d on T_clu (d) include (e)
create unique index T_clu_f on T_clu (f)

 

The key columns and covered columns for each index are:

Index

Key Columns

Covered Columns

T_heap_a

a

a

T_heap_bc

b, c

b, c

T_heap_d

d

d, e

T_heap_f

f

f

T_clu_a

a

a, b, c, d, e

T_clu_b

b, a

a, b

T_clu_ac

a, c

a, c

T_clu_d

d, a

a, d, e

T_clu_f

f

a, f

Note that each of the non-clustered indexes on T_clu include the clustered index key column a with the exception of T_clu_f which is a unique index.

To be continued one last time …

I’m planning one more post about scans and seeks.  I’ll write a bit more about some of the tradeoffs that SQL Server considers when deciding which index to use and whether to perform a scan, a seek, or a seek with a bookmark lookup.