How can SQL Server 2005 help me evaluate and manage indexes?

SQLCAT Blog:

Question: How can SQL Server 2005 help me evaluate and manage indexes?

(1) How can I find out whether my indexes are useful? How are they used?

(2) Do I have any tables or indexes that are not used (or rarely)

(3) What is the cost of index maintenance vs. its benefit?

(4) Do I have hot spots & index contention?

(5) Could I benefit from more (or less) indexes?

Answer:

SQL Server 2005 Dynamic Management Views (DMVs) are important insofar as they expose changing server state information that typically spans many sessions, many transactions, and many requests. DMVs provide a level of transparency that was not available in SQL Server 2000 and can be used for diagnostics, memory and process tuning, and monitoring. The SQL Server engine tracks detailed resource history in DMVs that can be queriable with SELECT statements but are not persisted to disk. Thus DMVs reflect activity since the last SQL Server recycle.

Since indexes provide an alternative to a table scan, and because DMVs expose index usage counters, we can compare the cost vs. the benefit of indexes. This comparison will include the maintenance cost of keeping the index up to date, versus the benefit of reads e.g. when an index can be used in lieu of a table scan. Keep in mind that an update or delete operation involves both a read which is first required to determine whether a row qualifies for the update operation, and a write if a row is found to qualify. In an insert operation, only writes will be performed on all indexes. Consequently, in an insert-intensive workload, writes will exceed reads. In an update- intensive (updates and deletes) workload, read and write counts are generally close assuming there are not a lot of ‘records not found’. In read-intensive workloads, read counts will exceed write counts. Referential constraints such as foreign keys require other read activity (for inserts, updates, and deletes) to ensure referential integrity is maintained.

(1) How can I find out whether my indexes are useful? How are they used?

First, we will determine whether indexes are ‘useful’. DDL is used to create objects (such as indexes) and update the catalog. Creating the index does not constitute ‘use’ of the index, and thus the index will not be reflected in the index DMVs until the index is actually used. When an index is used by a Select, Insert, Update, or Delete, its use is captured by sys.dm_db_index_usage_stats. If you have run a representative workload, all useful indexes will have been recorded in sys.dm_db_index_usage_stats. Thus, any index not found in sys.dm_db_index_usage_stats is unused by the workload (since the last re-cycle of SQL Server). Unused indexes can be found as follows:

(2) Do I have any tables or indexes that are not used (or rarely used)?

------ unused tables & indexes. Tables have index_id’s of either 0 = Heap table or 1 = Clustered Index

Declare @dbid int

Select @dbid = db_id('Northwind')

Select objectname=object_name(i.object_id)

, indexname=i.name, i.index_id

from sys.indexes i, sys.objects o

where objectproperty(o.object_id,'IsUserTable') = 1

and i.index_id NOT IN (select s.index_id

from sys.dm_db_index_usage_stats s

where s.object_id=i.object_id and

i.index_id=s.index_id and

database_id = @dbid )

and o.object_id = i.object_id

order by objectname,i.index_id,indexname asc

Rarely used indexes will appear in sys.dm_db_index_usage_stats just like heavily used indexes. To find rarely used indexes, you look at columns such as user_seeks, user_scans, user_lookups, and user_updates.

--- rarely used indexes appear first

declare @dbid int

select @dbid = db_id()

select objectname=object_name(s.object_id), s.object_id, indexname=i.name, i.index_id

, user_seeks, user_scans, user_lookups, user_updates

from sys.dm_db_index_usage_stats s,

sys.indexes i

where database_id = @dbid and objectproperty(s.object_id,'IsUserTable') = 1

and i.object_id = s.object_id

and i.index_id = s.index_id

order by (user_seeks + user_scans + user_lookups + user_updates) asc

(3) What is the cost of index maintenance vs. its benefit?

If a table is heavily updated and also has indexes that are rarely used, the cost of maintaining the indexes could exceed the benefits. To compare the cost and benefit, you can use the table valued function sys.dm_db_index_operational_stats as follows:

--- sys.dm_db_index_operational_stats

declare @dbid int

select @dbid = db_id()

select objectname=object_name(s.object_id), indexname=i.name, i.index_id

, reads=range_scan_count + singleton_lookup_count

, 'leaf_writes'=leaf_insert_count+leaf_update_count+ leaf_delete_count

, 'leaf_page_splits' = leaf_allocation_count

, 'nonleaf_writes'=nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count

, 'nonleaf_page_splits' = nonleaf_allocation_count

from sys.dm_db_index_operational_stats (@dbid,NULL,NULL,NULL) s,

sys.indexes i

where objectproperty(s.object_id,'IsUserTable') = 1

and i.object_id = s.object_id

and i.index_id = s.index_id

order by reads desc, leaf_writes, nonleaf_writes

--- sys.dm_db_index_usage_stats

select objectname=object_name(s.object_id), indexname=i.name, i.index_id

,reads=user_seeks + user_scans + user_lookups

,writes = user_updates

from sys.dm_db_index_usage_stats s,

sys.indexes i

where objectproperty(s.object_id,'IsUserTable') = 1

and s.object_id = i.object_id

and i.index_id = s.index_id

and s.database_id = @dbid

order by reads desc

go

The difference between sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats is as follows. Sys.dm_db_index_usage_stats counts each access as 1, whereas sys.dm_db_index_operational_stats counts depending on the operation, pages or rows.

(4) Do I have hot spots & index contention?

Index contention (e.g. waits for locks) can be seen in sys.dm_db_index_operational_stats. Columns such as row_lock_count, row_lock_wait_count, row_lock_wait_in_ms, page_lock_count, page_lock_wait_count, page_lock_wait_in_ms, page_latch_wait_count, page_latch_wait_in_ms, pageio_latch_wait_count, pageio_latch_wait_in_ms detail lock and latch contention in terms of waits. You can determine the average blocking and lock waits by comparing waits to counts as follows:

declare @dbid int

select @dbid = db_id()

Select dbid=database_id, objectname=object_name(s.object_id)

, indexname=i.name, i.index_id --, partition_number

, row_lock_count, row_lock_wait_count

, [block %]=cast (100.0 * row_lock_wait_count / (1 + row_lock_count) as numeric(15,2))

, row_lock_wait_in_ms

, [avg row lock waits in ms]=cast (1.0 * row_lock_wait_in_ms / (1 + row_lock_wait_count) as numeric(15,2))

from sys.dm_db_index_operational_stats (@dbid, NULL, NULL, NULL) s, sys.indexes i

where objectproperty(s.object_id,'IsUserTable') = 1

and i.object_id = s.object_id

and i.index_id = s.index_id

order by row_lock_wait_count desc

The following report shows blocks in the [Order Details] table, index OrdersOrder_Details. While blocks occur less than 2 percent of the time, when they do occur, the average block time is 15.7 seconds.

It would be important to track this down using the SQL Profiler Blocked Process Report. You can set the Blocked Process Threshold to 15 using sp_configure ‘Blocked Process Threshold’,15. Afterwards, you can run a trace to capture blocks over 15 seconds.

The Profiler trace will include the blocked and blocking process. The advantage of tracing for long blocks is the blocked and blocking details can be saved in the trace file and can be analyzed long after the block disappears. Historically, you can see the common causes of blocks. In this case the blocked process is the stored procedure NewCustOrder. The blocking process is the stored procedure UpdCustOrderShippedDate.

The caveat with Profiler Trace of Blocked Process Report is that in the case of stored procedures, you cannot see the actual statement within the stored procedure that is blocked. You can capture the actual blocked statement of a stored procedure in realtime (as it is occuring) using the following:

create proc sp_block_info

as

select t1.resource_type as [lock type]

,db_name(resource_database_id) as [database]

,t1.resource_associated_entity_id as [blk object]

,t1.request_mode as [lock req] --- lock requested

,t1.request_session_id as [waiter sid] --- spid of waiter

,t2.wait_duration_ms as [wait time]

,(select text from sys.dm_exec_requests as r --- get sql for waiter

cross apply sys.dm_exec_sql_text(r.sql_handle)

where r.session_id = t1.request_session_id) as waiter_batch

,(select substring(qt.text,r.statement_start_offset/2,

(case when r.statement_end_offset = -1

then len(convert(nvarchar(max), qt.text)) * 2

else r.statement_end_offset end - r.statement_start_offset)/2)

from sys.dm_exec_requests as r

cross apply sys.dm_exec_sql_text(r.sql_handle) as qt

where r.session_id = t1.request_session_id) as waiter_stmt --- statement blocked

,t2.blocking_session_id as [blocker sid] -- spid of blocker

,(select text from sys.sysprocesses as p --- get sql for blocker

cross apply sys.dm_exec_sql_text(p.sql_handle)

where p.spid = t2.blocking_session_id) as blocker_stmt

from

sys.dm_tran_locks as t1,

sys.dm_os_waiting_tasks as t2

where

t1.lock_owner_address = t2.resource_address

go

exec sp_block_info

(5) Could I benefit from more (or less) indexes?

Remembering that indexes involve both a maintenance cost and a read benefit, the overall index cost benefit can be determined by comparing reads and writes. Reading an index allows us to avoid table scans however they do require maintenance to be kept up-to-date. While it is easy to identify the fringe cases where indexes are not used, and the rarely used cases, in the final analysis, index cost benefit is somewhat subjective. The reason is the number of reads and writes are highly dependent on the workload and frequency. In addition, qualitative factors beyond the number of reads and writes can include a highly important monthly management report or quarterly VP report in which the maintenance cost is of secondary concern.

Writes of all indexes are performed for inserts, but there are no associated reads (unless there are referential constraints). Besides select statements, reads are performed for updates and deletes, writes are performed if rows qualify. OLTP workloads have lots of small transactions, frequently combining select, insert, update and delete operations. Data Warehouse activity is typically separated into batch windows having a high concentation of write activity, followed by an on-line window of read activity.

SQL Statement

Read

Write

Select

Yes

No

Insert

No

Yes, all indexes

Update

Yes

Yes, if row qualifies

Delete

Yes

Yes, if row qualifies

In general, you want to keep indexes to a funtional minimum in a high transaction OLTP environment due to high transaction throughput combined with the cost of index maintenance and potential for blocking. In contrast, you pay for index maintenance once during the batch window when updates occur for a data warehouse. Thus, data warehouses tend to have more indexes to benefit its read-intensive on-line users.

In conclusion, an important new feature of SQL Server 2005 includes Dynamic Management Views (DMVs). DMVs provide a level of transparency that was not available in SQL Server 2000 and can be used for diagnostics, memory and process tuning, and monitoring. DMVs can be useful in answering practical questions such as index usage, cost benefit of indexes, and index hot spots. Finally, DMVs are queriable with SELECT statements but are not persisted to disk. Thus they reflect changing server state information since the last SQL Server recycle.