Deadlock Troubleshooting, Part 1

A deadlock is a circular blocking chain, where two or more threads are each blocked by the other so that no one can proceed. When the deadlock monitor thread in SQL Server detects a circular blocking chain, it selects one of the participants as a victim, cancels that spid’s current batch, and rolls backs his transaction in order to let the other spids continue with their work. The deadlock victim will get a 1205 error:

 

Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

 

A deadlock is a special type of blocking scenario, but blocking and deadlocking are not the same thing. Sometimes we have people report that they are experiencing "deadlocking" when they are really only seeing blocking.

 

With very few exceptions, deadlocks are a natural side effect of blocking, not a SQL Server bug. The typical deadlock solution is either a stored proc/app code tweak, or a schema/indexing change.

 

Here’s how to troubleshoot deadlocks. These steps apply to most deadlocks, and they’ll allow you to resolve many of them without even having to dig into query plans or other nitty gritty details. What’s that? You like digging into query plans, and have nitty grits for breakfast every morning? OK then, we’ll look at a deadlock scenario from the inside out a bit later. But first, here are the basics:

 

  1. Turn on trace flag 1222 with “DBCC TRACEON (1222, -1)” or by adding “-T1222” as a SQL startup parameter. This trace flag is a new trace flag in SQL 2005, a much improved version of the tried-and-true -T1204. If you’re running SQL 2005, you should be using 1222 instead of 1204 unless you have deep-seated masochistic tendencies. Alternatives to 1222:
    • If you are using SQL 2000 or SQL 7.0, you’ll have no choice but to fall back on the older -T1204. 
    • There’s a “Deadlock graph” Profiler trace event that provides the same info as -T1222. Feel free to use this instead of -T1222 if you’re on SQL 2005. But don’t waste your time with the “Lock:Deadlock” and “Lock:Deadlock Chain” trace events that are in SQL 2000, as they provide an unacceptably incomplete picture of the deadlock.
  2. Get the -T1222 output from the SQL errorlog after the deadlock has occurred. You’ll see output that looks like this:

deadlock-list

 deadlock victim=processdceda8

  process-list

   process id=processdceda8 taskpriority=0 logused=0 waitresource=KEY: 2:72057594051493888 (0400a4427a09) waittime=5000 ownerId=24008914 transactionname=SELECT lasttranstarted=2006-09-08T15:54:22.327 XDES=0x8fd9a848 lockMode=S schedulerid=1 kpid=4404 status=suspended spid=54 sbid=0 ecid=0 priority=0 transcount=0 lastbatchstarted=2006-09-08T15:54:22.293 lastbatchcompleted=2006-09-08T15:54:22.293 clientapp=OSQL-32 hostname=BARTD2 hostpid=3408 loginname=bartd isolationlevel=read committed (2) xactid=24008914 currentdb=2 lockTimeout=4294967295 clientoption1=538968096 clientoption2=128056

    executionStack

     frame procname=tempdb.dbo.p1 line=2 stmtstart=60 sqlhandle=0x03000200268be70bd

   SELECT c2, c3 FROM t1 WHERE c2 = @p1

     frame procname=adhoc line=2 stmtstart=32 stmtend=52 sqlhandle=0x020000008a4df52d3

       EXEC p1 3

    inputbuf

  EXEC p1 3

   process id=process3c54c58 taskpriority=0 logused=16952 waitresource=KEY: 2:72057594051559424 (0900fefcd2fe) waittime=5000 ownerId=24008903 transactionname=UPDATE lasttranstarted=2006-09-08T15:54:22.327 XDES=0x802ecdd0 lockMode=X schedulerid=2 kpid=4420 status=suspended spid=55 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2006-09-08T15:54:22.327 lastbatchcompleted=2006-09-08T15:54:22.310 clientapp=OSQL-32 hostname=BARTD2 hostpid=2728 loginname=bartd isolationlevel=read committed (2) xactid=24008903 currentdb=2 lockTimeout=4294967295 clientoption1=538968096 clientoption2=128056

    executionStack

     frame procname=tempdb.dbo.p2 line=2 stmtstart=58 sqlhandle=0x030002005fafdb0c

       UPDATE t1 SET c1 = FLOOR (c1), c2 = FLOOR (c2) WHERE c1 = @p1

     frame procname=adhoc line=2 stmtstart=32 stmtend=52 sqlhandle=0x020000006f878816

       EXEC p2 3

    inputbuf

  EXEC p2 3

  resource-list

   keylock hobtid=72057594051559424 dbid=2 objectname=tempdb.dbo.t1 indexname=idx1 id=lock83642a00 mode=S associatedObjectId=72057594051559424

    owner-list

     owner id=processdceda8 mode=S

    waiter-list

     waiter id=process3c54c58 mode=X requestType=wait

   keylock hobtid=72057594051493888 dbid=2 objectname=tempdb.dbo.t1 indexname=cidx id=lock83643780 mode=X associatedObjectId=72057594051493888

    owner-list

     owner id=process3c54c58 mode=X

    waiter-list

     waiter id=processdceda8 mode=S requestType=wait

 

  1. “Decode” the -T1222 output to better understand the deadlock scenario. The deadlock is summarized by a “process-list” and a “resource-list”. A “process” is a spid or worker thread that participates in the deadlock. Each process is assigned an identifier, like “processdceda8”. A resource is a resource that one of the participants owns (usually a lock) that the other participant is waiting on. I like to use a format like the one below to summarize the deadlock. You can skip this step if you want, but I never do; I find it really helps me understand the deadlock situation more clearly. I’ve highlighted in yellow each of the data points within the 1222 output that you would need to reconstruct this summary on your own.

    Spid 54 is running this query (line 2 of proc [p1]):
    SELECT c2, c3 FROM t1 WHERE c2 = @p1
    Spid 55 is running this query (line 2 of proc [p2]):
    UPDATE t1 SET c1 = FLOOR (c1), c2 = FLOOR (c2) WHERE c1 = @p1

    Spid 54 is waiting for a Shared KEY lock on index t1.cidx.
    (Spid 55 holds a conflicting X lock.)
    Spid 55 is waiting for an eXclusive KEY lock on index t1.idx1.
    (Spid 54 holds a conflicting S lock.)

    For most lock types (including KEY locks, as shown in this example), SQL will directly identify the index by name in the output. For some lock types, though, you'll get an "associatedObjectId", but no object name. An example:

    pagelock fileid=1 pageid=95516 dbid=9 objectname="" id=lock177a9e280 mode=IX associatedObjectId=72057596554838016

    The attribute "associatedObjectId" isn't the type of Object ID that you're probably familiar with; it's actually a partition ID. You can determine the database name by running "SELECT DB_NAME(9)", where the "9" in this example comes from the "dbid" attribute, highlighted in blue. Then you can determine the index and table name by looking up the associatedObjectId/PartitionId in the indicated database:

    SELECT OBJECT_NAME(i.object_id), i.name
    FROM sys.partitions AS p
    INNER JOIN sys.indexes AS i ON i.object_id = p.object_id AND i.index_id = p.index_id
    WHERE p.partition_id = 72057596554838016

    For those of you on SQL 2005 who think that the -T1222 output is a bit overwhelming, you're right. But you may also want to count your blessings and be thankful that you don’t have to wade through -T1204 output, which is a lot more difficult to interpret than -T1222 and doesn’t provide nearly as much useful information about the deadlock. Check out the file "Decoding_T1204_Output.htm" attached to this post for annotated -T1204 output.

  2. Run the queries involved in the deadlock through Database Tuning Advisor. Plop the query in a Management Studio query window, change db context to the correct database, right-click the query text and select “Analyze Query in DTA”. Don’t skip this step; more than half of the deadlock issues we see are resolved simply by adding an appropriate index so that one of the queries runs more quickly and with a smaller lock footprint. If DTA recommends indexes (it'll say “Estimated Improvement: <some non-zero>%”), create them and monitor to see if the deadlock persists. You can select “Apply Recommendations” from the Action drop-down menu to create the index immediately, or save the CREATE INDEX commands as a script to create them during a maintenance window. Be sure to tune each of the queries separately.

  3. Make sure the query is using the minimum necessary transaction isolation level (-T1222 will tell you this – search the output for “isolationlevel”). Queries run by transactional COM+ components will default to serializable, which is usually overkill. This can be reduced by query hints (“...FROM tbl1 WITH (READCOMMITTED)...”), a SET TRANSACTION ISOLATION LEVEL command, or, in Windows 2003 and later, by configuring the object in the Component Services MMC plugin.

  4. Make sure that your transactions are as brief as they can be while still meeting the relevant business constraints. Try not to use implicit transactions, as this model of transaction management encourages unnecessarily long transactions.

  5. Look for other opportunities to improve the efficiency of the queries involved in the deadlock, either through query changes or through indexing improvements. A query that locks the minimum number of resources will be much less likely to deadlock with another query. Table scans, index scans, and large hashes or large sorts in the query plan may indicate opportunities for improvement.

  6. If one or both spids is running a multi-statement transaction, you may need to capture a profiler trace that spans the deadlock in order to identify the full set of queries that were involved in the deadlock. Unfortunately, both -T1204 and -T1222 only print out the two queries that “closed the loop”, and it’s possible that one of the blocking locks was acquired by an earlier query run within the same transaction.

These are all general recommendations that you can apply to any deadlock without having to really roll up your sleeves and get dirty. If after doing all of this you haven’t resolved it, though, you’ll have to dive a bit deeper and tailor a solution to the specifics of the scenario. Here’s a menu of some common techniques that you can choose from when deciding how best to tackle a deadlock:

 

  • Access objects in the same order. Consider the following two batches:

1. Begin Transaction

1. Begin Transaction

2. Update Part table

2. Update Supplier table

3. Update Supplier table

3. Update Part table

4. Commit Transaction

4. Commit Transaction

These two batches may deadlock frequently. If both are about to execute step 3, they may each end up blocked by the other because they both need access to a resource that the other connection locked in step 2.

  • If both deadlock participants are using the same index, consider adding an index that can provide an alternate access path to one of the spids. For example, adding a covering nonclustered index for a SELECT involved in a deadlock may prevent the problem (assuming that none of the covering index keys are modified by the other deadlock participant).
  • On the other hand, if the spids are deadlocking because they took alternate paths (indexes) to a common required data row or page, consider whether one of the indexes can be removed or an index hint used to force both queries to share an access path. Be cautious of potential performance hits as a result of this approach.
  • Deadlocks are a special type of blocking where two spids both end up blocking the other. Sometimes the best way to prevent a deadlock is to force the blocking to occur at an earlier point in one of the two transactions. For example, if you force spid A to be blocked by spid B at the very beginning of A’s transaction, it may not have a chance to acquire the lock resource that later ends up blocking spid B. Doesn’t this means you are deliberately causing blocking? Yes, but remember that you already have blocking or you wouldn’t be in a deadlock situation, and simple blocking is a big improvement over a deadlock. As soon as B commits his transaction, A will be able to proceed. HOLDLOCK and UPDLOCK hints can be useful for this.
  • If a high priority process is being selected as a victim in a deadlock with a lower priority process, the lower priority process could be modified to SET DEADLOCK_PRIORITY LOW. Spids that set this will offer themselves up as the sacrificial lamb in any deadlock they encounter.
  • Avoid placing clustered indexes on columns that are frequently updated. Updates to clustered index key columns will require locks on the clustered index (to move the row) and all nonclustered indexes (since the leaf level of NC indexes reference rows by clustered index key value).
  • In some cases it may be appropriate to add a NOLOCK hint, assuming that one of the queries is a SELECT statement. While this is a tempting path because it is a quick and easy solution for many deadlocks, approach it with caution as it carries with it all the usual caveats surrounding read uncommitted isolation level (a query could return a transactionally inconsistent view of the data). If you are unfamiliar with the risks, read the "SET TRANSACTION ISOLATION LEVEL" topic in SQL Books Online.
  • In SQL 2005 you could consider the new SNAPSHOT isolation level. This will avoid most blocking while avoiding the risks of NOLOCK. An even cooler new feature IMHO is the new READ COMMITTED SNAPSHOT database option (see ALTER DATABASE), which allows you to use a variant of snapshot isolation level without changing your app.
  • If one or both locks involved in the deadlock are S/X TAB (table) locks, lock escalation may be involved. You can reduce the likelihood of lock escalation by enabling trace flag 1224 (SQL 2005 and later) or 1211 (see KB 323630). Note that this does not apply to "intent" TAB locks, which have a capital "I" prefix (e.g. IS / IX TAB locks).
  • If the deadlock is intermittent, sometimes the simplest solution is to add deadlock retry logic. The retry logic could be in T-SQL, as long as (a) you're on SQL 2005 or later so that you can use BEGIN TRY, and (b) your transaction is wholly-contained within a single stored proc or batch. See this article for details. If the deadlock transaction spans multiple batches you can still add deadlock retry logic, but it would need to be moved out to the client app code. If you can only add deadlock retry logic to one of the participants in the deadlock, you can use SET DEADLOCK_PRIORITY LOW to ensure that the engine prefentially aborts the transaction of the guy that has the retry logic.

In a follow-up post I’ll look at a fairly typical deadlock in detail. This will provide an example of what you'd have to do if the 8 high-level steps listed above fail you, forcing you to understand the scenario at a deeper level so that you can craft a custom solution.

  

(This post series is continued in Deadlock Troubleshooting, Part 2.)

  

SQL2000_Deadlocks_T1204.htm