Why should I create an index?

There are many cases where the database administrator does not control the queries being submitted of the system. As a result, the physical database design is often not tuned as well as it could be. In a number of actual customer cases where we investigated performance issues with them, we’ve found that this can often be the an effective way to improve performance.

Some reasons indexes are helpful in a deployed system:

  1. To pre-materialize a sort over the data
  2. To enable singleton lookups (seeks) for particular values
  3. To reduce locking requirements caused by scans
  4. To reduce memory requirements caused by hash joins

The first two are more generally understood benefits. You can avoid a sort in your plan with (1) and you can avoid scans with (2). As long as the update cost to maintain these indexes is acceptable, then you can improve select query performance with an index.

The third issue, locking, is a problem in many applications about which there isn’t really a lot written. Some applications are “bound” on how many locks they take. One example could be a billing application with a number of small queries that access individual (or a small number) of rows but have many, many such queries concurrently. If the query plans chosen are all seeks, then shared or exclusive locks are likely only taken on the rows being specifically accessed/changed in the query results. However, if one plan changes to be a scan instead of a seek, the number of locks will increase dramatically. These locks may also be held for longer (depending on the isolation level), since the query may be running longer as well. An index can improve the performance of such an application because it improves overall query throughput for many concurrent users.

Memory contention is another throughput-related issue that can cause problems when an application runs with many concurrent connections. In SQL Server, queries reserve memory for the duration of their execution. This prevents some queries from failing on a memory allocation after they have completed only a portion of the query. If the system does not have any more memory to give, not-yet-started queries will be blocked and wait for previous queries to complete (and release their memory reservations). In some applications, the total system throughput is limited by the available memory necessary to enable operations like hash joins.

Memory contention is most likely a problem for scaled applications running on 32-bit hardware that use a lot of memory at peak load. While Microsoft Windows does have a mechanism called AWE that enables a process to use more than the normal maximum memory (usually 2 GB), that mechanism is only really useful to store pages from the database in memory. Internal memory consumers, such as the Query Optimizer and Query Execution, are limited to the memory available in the virtual address space (usually 2 GB minus whatever is used for the database page buffer pool and other internal caches/memory consumers). This can constrain systems that have few indexes but many queries requiring joins without backing indexes because a hash join is often a good choice for the Optimizer. Additionally, the optimizer does not pick different plan shapes based on system load in SQL Server 2005. So, creating an index can cause the system to pick loops joins or merge joins in cases where hash joins would be picked otherwise. Creating an index or two on such a system can reduce memory contention and improve overall system throughput.

SQL Server has long shipped a program to help with physical database design called the Index Tuning Wizard (and now the Database Tuning Advisor in SQL Server 2005). This can help find a reasonably optimal index set for a set of queries. It works by running the Optimizer with a series of “what if” scenarios and evaluating the cost of the plan the optimizer picked in each case. It then reasons about the best set of indexes to match the workload.

There is also a newer mechanism that you can use in SQL Server 2005 that compliments the existing Database Tuning Advisor Tool. It does not require that you pre-identify a workload, and it is integrated into the engine and runs as part of the regular operation of the server (you do not need to run anything). It does not do as much work as the DTA, but it can find the common problems that cause significant performance problems in deployed systems. The development team used this to debug a number of customer applications and found that it did identify “better” indexes in a number of cases. If you have to debug a live system where the physical database design is not known to be close to optimal, this may be a useful tool to help improve the system performance.

The specific details of the DMVs are documented here:

https://msdn2.microsoft.com/en-US/library/ms345434(SQL.90).aspx

https://msdn2.microsoft.com/en-US/library/ms345407(SQL.90).aspx

https://msdn2.microsoft.com/en-us/library/ms345421(SQL.90).aspx

The following examples show an example about how they can be used to identify and improve the physical database design in a simple example:

use tempdb

drop table t

create table t(col1 int, col2 int, col3 binary(500))

declare @i int

set @i = 0

while @i < 10000

begin

insert into t(col1, col2) values (@i, rand()*1000)

set @i = @i + 1

end

set showplan_text on

select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

StmtText

------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Hash Match(Inner Join, HASH:([t2].[col1])=([t1].[col1]), RESIDUAL:([t].[col1] as [t2].[col1]= [t].[col1] as [t1].[col1]))

|--Table Scan(OBJECT:( [t] AS [t2]), WHERE:( [t].[col2] as [t2].[col2]=(500)))

|--Table Scan(OBJECT:( [t] AS [t1]))

set showplan_text off

-- run the query

select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

-- look at our management views to see what recommendations exist for this query

select * from sys.dm_db_missing_index_details

select * from sys.dm_db_missing_index_groups

select * from sys.dm_db_missing_index_group_stats

(columns removed for space)

index_handle database_id object_id equality_columns included_columns

------------ ----------- ----------- ----------------------------------------

11 2 741577680 [col2] NULL

9 2 741577680 [col1] [col2], [col3]

(2 row(s) affected)

index_group_handle index_handle

------------------ ------------

10 9

12 11

(2 row(s) affected)

group_handle unique_compiles user_seeks user_scans last_user_seek last_user_scan avg_total_user_cost avg_user_impact system_seeks system_scans last_system_seek last_system_scan avg_total_system_cost avg_system_impact

------------ -------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- ----------------------------------------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- -----------------------------------------------------

10 1 1 0 2006-04-06 11:14:55.130 NULL 1.2693552627256595 50.409999999999997 0 0 NULL NULL 0.0 0.0

12 1 1 0 2006-04-06 11:14:55.130 NULL 1.2693552627256595 46.479999999999997 0 0 NULL NULL 0.0 0.0

(2 row(s) affected)

create index i1 on t(col2)

-- let's see if the plan changed

set showplan_text on

select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

-- now we have an index seek, followed by a bookmark lookup, then followed by a hash join

StmtText

------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Hash Match(Inner Join, HASH:([t2].[col1])=([t1].[col1]), RESIDUAL:([t].[col1] as [t2].[col1]= [t].[col1] as [t1].[col1]))

|--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1003]))

| |--Index Seek(OBJECT:([t].[i1] AS [t2]), SEEK:([t2].[col2]=(500)) ORDERED FORWARD)

| |--RID Lookup(OBJECT:([t] AS [t2]), SEEK:([Bmk1003]=[Bmk1003]) LOOKUP ORDERED FORWARD)

|--Table Scan(OBJECT:([tempdb].[dbo].[t] AS [t1]))

set showplan_text off

-- run the query again (it should be a bit faster now)

select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

-- let's look at our management views again

select * from sys.dm_db_missing_index_details

select * from sys.dm_db_missing_index_groups

select * from sys.dm_db_missing_index_group_stats

(columns removed for space)

index_handle database_id object_id equality_columns included_columns

------------ ----------- ----------- -----------------------------------

13 2 741577680 [col1] [col2], [col3]

(1 row(s) affected)

index_group_handle index_handle

------------------ ------------

14 13

(1 row(s) affected)

group_handle unique_compiles user_seeks user_scans last_user_seek last_user_scan avg_total_user_cost avg_user_impact system_seeks system_scans last_system_seek last_system_scan avg_total_system_cost avg_system_impact

------------ -------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- ----------------------------------------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- -----------------------------------------------------

14 1 1 0 2006-04-06 11:17:01.617 NULL 0.70575013268039988 90.670000000000002 0 0 NULL NULL 0.0 0.0

create clustered index i2 on t(col1)

-- let's see if the plan changed

set showplan_text on

select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

StmtText

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

|--Nested Loops(Inner Join, OUTER REFERENCES:([t2].[col1]))

|--Nested Loops(Inner Join, OUTER REFERENCES:([Uniq1005], [t2].[col1]))

| |--Index Seek(OBJECT:([t].[i1] AS [t2]), SEEK:([t2].[col2]=(500)) ORDERED FORWARD)

| |--Clustered Index Seek(OBJECT:([t].[i2] AS [t2]), SEEK:([t2].[col1]= [t].[col1] as [t2].[col1] AND [Uniq1005]=[Uniq1005]) LOOKUP ORDERED FORWARD)

|--Clustered Index Seek(OBJECT:([t].[i2] AS [t1]), SEEK:([t1].[col1]=[t].[col1] as [t2].[col1]) ORDERED FORWARD)

set showplan_text off

-- now run the query again (it should be even faster)

select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

I hope that this gives you enough information to experiment with this on your own applications.

Thanks,

Conor