Queries with UPDLOCK hints on small tables causing blocking

While working with one our ISV customer’s in Microsoft performance testing labs recently, we discovered an unexpected blocking issue with the SELECT queries using the UPDLOCK hints on a small table and in explicitly defined user transactions. I said ‘unexpected blocking issue’ because from our perspective, we were processing two different rows in a table, in two different transactions - something like transaction T1 picks up a row, say Row-1 and starts processing it, while another transaction, say T2 picks up another row say Row-2 for processing, but T1 is still blocking T2. We analyzed the output from the sys.dm_tran_locks DMV and found that transaction T2 was waiting on the same KEY resource as held by transaction T1. We then looked at the query plans of the concerned queries and noticed that SQL Server wasn’t selecting an expected non clustered index on the filter column of the SELECT query, and was instead doing a clustered index scan, resulting in blocking. If you are wondering why SQL Server wouldn’t select the expected index (notice that I said ‘expected’ and not ‘right’ :)), even when we have one on the filter column of the SELECT query, you will find the answer later in this blog.

Below are more details on this scenario and a few workable solutions –

 
Scenario:

You have a table in your database that has the following attributes –

  1. Few records in the table (anywhere from 1 record to a few thousand records)
  2. You are using UPDLOCK hints with SELECT queries on this table in explicit user transactions (BEGIN TRAN… COMMIT TRAN)
  3. The SELECT queries have a WHERE clause on a column in the table and this columns either does not have any index defined on it or has a non-unique, non-clustered index on it
  4. You see blocking chains where multiple transactions on this table with SELECT and UPDLOCK are fighting for the same KEY resource (the same key hash value)

Steps to reproduce this scenario:

 -- Create a Test database
Create Database Test
Go

-- Create a table that has the attributes as discussed above
Use Test
Go

Create Table Mytable (Col1 int not null, Col2 varchar(100) null, Col3 bigint null)
Go

-- Create Indexes 
Create Unique Clustered Index IdxPrimary on Mytable (Col1)
Go

Create Index #IdxNC on Mytable(Col3)
Go

-- Insert a few records into this table
Insert into Mytable values (1, 'AAA', 50)
Insert into Mytable values (2, 'BBB', 60)
Insert into Mytable values (3, 'CCC', 70)
Insert into Mytable values (4, 'DDD', 80)
Insert into Mytable values (5, 'EEE', 90)
Go

-- Start a transaction in a separate query window, say this comes as SPID X in SQL Server
Begin Tran Select * from dbo.Mytable with (UPDLOCK) where Col3 = 50


-- Start another transaction in a second query window, say this comes as SPID Y in SQL Server
Begin Tran Select * from dbo.Mytable with (UPDLOCK) where Col3 = 60

What you will notice is that SPID X is blocking SPID Y. If you look at the query plan for SPID X in SQL Server Profiler, you will notice that SQL Server has chosen to do a clustered index scan, although there is a non-clustered index on the filter column Progress –

Clustered Index Scan(OBJECT:([TEST].[dbo].[Mytable].[IdxPrimary]), WHERE:([Test].[dbo].[Mytable_repro].[Col3]=(50))

This clustered index scan is causing the second transaction (SPID Y) to get blocked by the first transaction (SPID X). You may be wondering that even though SQL Server has chosen the clustered index scan, The locked resource is still a KEY resource (and not the entire table), so why are we seeing blocking. Here’s what’s happening -

SPID X start scanning the table and it checks for every row in the table to see if it matches the criteria specified in the WHERE clause. Remember that it’s doing that because the SQL Server has chosen not to use the non-clustered index on our filter column, so it doesn’t know which rows would qualify for our filter criteria (Col3 = 50). Once SPID X finds the row (or all the rows) matching the filter criteria, it locks that row (or rows) with an Update (U) Lock. So now, when SPID Y starts scanning the clustered index, it has to wait on the key that is locked by SPID X because, it has to verify whether or not the locked row matches the criteria specified in the WHERE clause. The result is that SPID X continues to hold the update lock while the transaction is in progress, while SPID Y waits on SPID X to release the Update (U) lock on that key, so that it can continue with the scan.

So the question is why SQL Server is doing a Clustered Index Scan instead of using the non-clustered index that exists on filter column (Col3 in our scenario)? SQL Server has a cost-based query optimization engine. The goal of SQL Server Query Optimizer is this – out of many (and not all!) possible query plans, discover a plan that has the least amount of cost associated with it, with cost being a function of CPU, Memory and IO. When Query Optimizer evaluated the query plans for our SELECT query, it found that the plan with clustered index scan has a lower cost when compared to the plan with a seek on the non-clustered index followed by the lookup on the clustered index.

Workarounds:

Here are a couple of workarounds to fix this issue by ensuring that the Query Optimizer will select the expected non-clustered index on the filter column. You can choose any one of these depending on your application requirements.

-
Make the Non Clustered Index a Covering Index and include all columns referenced in the query in the index as INLCUDED columns:

-
Include the filter column in the Clustered index key

-
Populate the table with dummy records so that QO can pass the threshold, where it can use the Non Clustered Index on the filter column

-
Use the FORCESEEK query hint in your blocking query to force the index seek.

-
Use Plan Guides with the Index hint specifying the name of the NC index or with the FORCESEEK hint

-
Make the Non Clustered Index a Unique index

In our case, we worked with the ISV vendor and their partner to enforce the FORCESEEK query hint with all of their SELECT, UPDATE and DELETE queries in their application to fix this issue.