Bookmark Lookup


In my last post, I explained how SQL Server can use an index to efficiently locate rows that qualify for a predicate.  When deciding whether to use an index, SQL Server considers several factors.  These factors include checking whether the index covers all of the columns that the query references (for the table in question).


What does it mean for an index to cover a column?


The heap or clustered index for a table (often called the “base table”) contains (or covers) all columns in the table.  Non-clustered indexes, on the other hand, contain (or cover) only a subset of the columns in the table.  By limiting the set of columns stored in a non-clustered index, we can store more rows on each page.  Thus, we save disk space and improve the efficiency of seeks and scans by reducing the number of I/Os and the number of pages that we touch.


Each non-clustered index covers the key columns that were specified when it was created.  Also, if the base table is a clustered index, each non-clustered index on this table covers the clustered index keys (often called the “clustering keys”) regardless of whether they are part of the non-clustered index’s key columns.  In SQL Server 2005, we can also add additional non-key columns to a non-clustered index using the “include” clause of the create index statement.


For example, given this schema:


create table T_heap (a int, b int, c int, d int, e int)


create index T_heap_a on T_heap (a)


create index T_heap_bc on T_heap (b, c)


create index T_heap_d on T_heap (d) include (e)


 


create table T_clu (a int, b int, c int, d int, e int)


create unique clustered index T_clu_a on T_clu (a)


create index T_clu_b on T_clu (b)


create index T_clu_ac on T_clu (a, c)


create index T_clu_d on T_clu (d) include (e)


The covered columns for each index are:





























Index


Covered Columns


T_heap_a


a


T_heap_bc


b, c


T_heap_d


d, e


T_clu_a


a, b, c, d, e


T_clu_b


a, b


T_clu_ac


a, c


T_clu_d


a, d, e


Note that order is not relevant for covered columns.


How does this relate to index seeks and bookmark lookups?


Consider this query (using the above schema):


select e from T_clu where b = 2


At first glance, this query looks like a perfect candidate for an index seek using the non-clustered index on column b (T_clu_b) with the seek predicate “b = 2”.  However, this index does not cover column e so a seek or scan of this index cannot return the value of column e.  The solution is simple.  For each row that we fetch from the non-clustered index, we can lookup the value of column e in the clustered index.  We call this operation a “bookmark lookup.”  A “bookmark” is a pointer to the row in the heap or clustered index.  We store the bookmark for each row in the non-clustered index precisely so that we can always navigate from the non-clustered index to the corresponding row in the base table.


The following figure illustrates a bookmark lookup:



Showplan Samples


In SQL Server 2000, we implement bookmark lookup using a dedicated iterator whether the base table is a clustered index or heap:



  |–Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([T_clu]))
       |–Index Seek(OBJECT:([T_clu].[T_clu_b]), SEEK:([T_clu].[b]=2) ORDERED FORWARD)


In SQL Server 2005, we use a regular clustered index seek if the base table is a clustered index or a RID (record id) lookup if the base table is a heap:



  |–Nested Loops(Inner Join, OUTER REFERENCES:([T_clu].[a]))
       |–Index Seek(OBJECT:([T_clu].[T_clu_b]), SEEK:([tempdb].[dbo].[T_clu].[b]=(2)) ORDERED FORWARD)
       |–Clustered Index Seek(OBJECT:([T_clu].[T_clu_a]), SEEK:([T_clu].[a]= [T_clu].[a]) LOOKUP ORDERED FORWARD)



  |–Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000]))
       |–Index Seek(OBJECT:([T_heap].[T_heap_a]), SEEK:([T_heap].[a]=(2)) ORDERED FORWARD)
       |–RID Lookup(OBJECT:([T_heap]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)


You can tell that the clustered index seek is a bookmark lookup by the LOOKUP keyword in text showplan or by the attribute Lookup=“1” in XML showplan.  I will explain the behavior of the nested loops join in a future post.  The loop join and clustered index seek (or RID lookup) perform the same operation as the bookmark lookup in SQL Server 2000.


Tradeoffs


Bookmark lookup is not a cheap operation.  Assuming (as is commonly the case) that there is no correlation between the non-clustered and clustered index keys, each bookmark lookup performs a random I/O into the clustered index.  Random I/Os are very expensive.  When comparing various plan alternatives including scans, seeks, and seeks with bookmark lookups, the optimizer must decide whether it is cheaper to perform more sequential I/Os and touch more rows using an index scan or a seek with a less selective predicate that covers all required columns or to perform fewer random I/Os and touch fewer rows using a seek with a more selective predicate and a bookmark lookup.


In some cases, you can introduce a better plan option by creating a new index or by adding one or more columns to an existing index so as to eliminate a bookmark lookup or change a scan into a seek.  In SQL Server 2000, the only way to add columns to an index is to add additional key columns.  As I mentioned above, in SQL Server 2005, you can add also columns using the include clause of the create index statement.  Included columns are more efficient than key columns; they save disk space and make searching and updating the index more efficient.


Of course, whenever you create new indexes or add new key or included columns to an existing index, you do consume additional disk space and you make it more expensive to search and update the index.  Thus, you must balance the frequency and importance of the queries that benefit from the new index against the queries or updates that are slower.


To be continued again …


In my next post, I’ll go into a bit more detail about when we can perform index seeks and take a look at single vs. multi-column indexes.

Comments (15)

  1. Anonymous says:

    The optimizer must choose an appropriate “access path” to read data from each table referenced in a query. …

  2. Anonymous says:

    In my last two posts, I discussed two scenarios – one involving updates and another involving large objects

  3. Anonymous says:

    Over the past month or so, I’ve looked at pretty much every isolation level except for read uncommitted

  4. Vitaly Komarovsky says:

    Craig,

    Thanks a lot for your blog! Could you point to any source (book or Internet) describing the icons in graphic execution plans and operations behind them. So far I was getting this information piece by piece from articles like yours and using common sense. Does this knowledge exist somewhere in a combined form?

  5. I’m not aware of any really good resource for this information all in one place.  Books Online has a topic on logical and physical operators that covers all of the operators but only at a fairly high level.  See http://msdn2.microsoft.com/en-us/library/ms191158.aspx.

  6. Anonymous says:

    由于需要给同事培训数据库的索引知识,就收集整理了这个系列的博客。发表在这里,也是对索引知识的一个总结回顾吧。通过总结,我发现自己以前很多很模糊的概念都清晰了很多。 不论是 聚集索引,还是非聚集索引,都是用B

  7. Anonymous says:

    In this post from last week, I gave an example of a query with a conversion where the optimizer pushes

  8. Anonymous says:

    不论是聚集索引,还是非聚集索引,都是用B 树来实现的。我们在了解这两种索引之前,需要先了解B 树。如果你对B树不了解的话,建议参看以下几篇文章: BTree,B-Tree,B Tree,B*T…

  9. Anonymous says:

    SQLServer索引基础知识(2)—-聚集索引,非聚集索引

    [来自]http://blog.joycode.com/ghj/archive/2008/01/02/113291.aspx

  10. Anonymous says:

    In my last post , I explained the importance of asynchronous I/O and described how SQL Server uses sequential

  11. Anonymous says:

    In this post from last year, I discussed how random I/Os are slower than sequential I/Os (particularly

  12. I have seen few users out there thinks that Deadlocks in SQL Server is a bug which has not be corrected

  13. Arif says:

    Craig,

    I just love your  blogs. Dude, you are good:)