Troubleshooting deadlocks in SQL2005

Deadlock in the context of SQL Server means two or more transactions or tasks are waiting on each other to acquire the resources needed to complete. A resource can be a logical lock or can be memory grant or can be a worker thread and so on.

 

Normally, when a transaction T1 requests a lock to access a resource, it may get blocked because some other transaction T2 may have already locked the resource in a conflicting mode. In that case, T1 waits for the lock to be released before proceeding. Under normal operations, one T2 releases the lock; it is then acquired by T1. Two blocking scenarios are of interest here. First, what if T2 never releases the lock? In this case, unless a lock time out is specified, T1 will wait indefinitely. SQL Server has no way of knowing that it is indeed the case and it does not do anything. An external intervention is required to break it, typically by killing the first transaction. Second, what if T2 requests a lock on another resource that is already locked by T1 in conflicting mode? Since T1 is waiting for T2, and now T2 is waiting for T1, it results in a deadlock and in this case no amount of waiting will help. SQL Server has implemented the logic to detect such cycles in resource acquisition since the very beginning. A background task, called the deadlock monitor, checks for cycles periodically, typically every 5 seconds but it can vary depending how often the deadlock is detected. When a deadlock is discovered, it breaks the deadlock by choosing a victim that has consumed to the lowest resources so far. The victim’s session will get 1205 error and the transaction will be aborted. SQL Server provides you some control, by means of deadlock priority, which task will be chosen as deadlock victim.

 

When a session gets 1205 error, it does not provide any information on why the deadlock happened, what sessions and resources were involved in the deadlock cycle. Without this information, it is difficult to understand the causes of deadlock and to prevent or minimize it in the future. To get the detailed diagnostic information on the deadlock, it is recommended that you run SQL Server with TF-1222 (new in SQL2005) and provides you more detailed information than the related TF-1204 which also can be used for the similar purpose. Note there is no performance overhead of running SQL Server with these traces flags enabled as these trace flags don’t control deadlock monitor task. However, if your SQL Server encounters deadlocks frequently, you may see some impact as the detailed diagnostic information needs to be generated for every deadlock.

 

Let us consider an example of a simple deadlock between two transactions as shown in the table below. You will notice that T1 holds X lock on all rows with c1=5 on table t_lock1 while T2 holds X lock on all rows with C1=1 on table t_lock2. Now each of these transactions wants to update the rows previously locked by the other . This results in a deadlock.

 

Transaction T1

Transaction T2

begin tran

update t_lock1 set c2 = 10

where c1 = 5

 

begin tran

update t_lock2 set c2 = 10

where c1 = 1

update t_lock2 set c2 = 10

where c1 = 1

 

update t_lock1 set c2 = 11

where c1 = 5

commit

commit

 

 

Here is the output of TF-1222 (with prefix information removed for clarity) that is generated for this deadlock event. This provides lots of useful details (hi-lighted) like the sessions involved in the deadlock, the locks currently held, locks they are blocked on, and the currently executing statements. It shows that there are two sessions (listed under <process-list>) that are blocked on each other and there two resources (listed under <resource-list>) indicating lock owners and waiters. Note, there is no detail when and where the lock was acquired. For that you will need to analyze all the statements in each of the tasks. With this information, you may be able to identify the cause of the deadlock and then make appropriate changes in the application to reduce its occurrence. For the deadlock in the example here, you can possibly eliminate this deadlock in two ways. First, you can execute each statement in its own transaction (i.e. by keeping the transaction short). Second, you can change the order of update statements in one of the transaction such that they access the tables in the same order (i.e. access objects in the same order). However, this may not always be possible for a given application logic. For more details please refer to product documentation.

 

<deadlock-list>

 <deadlock victim="process699978">

  <process-list>

   <process id="process698d48" taskpriority="0" logused="216" waitresource="RID: 6:1:143:4" waittime="5027" ownerId="4031" transactionname="user_transaction" lasttranstarted="2005-12-28T17:44:51.830" XDES="0x3dcba00" lockMode="U" schedulerid="1" kpid="892" status="suspended" spid="53" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2005-12-28T17:44:56.147" lastbatchcompleted="2005-12-28T17:44:51.830" clientapp="Microsoft SQL Server Management Studio - Query" hostname="SUNILA2" hostpid="904" loginname="REDMONDsunila" isolationlevel="read committed (2)" xactid="4031" currentdb="6" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">

    <executionStack>

     <frame procname="adhoc" line="1" stmtstart="38" sqlhandle="0x02000000f998880690371817254885b67f17fbc1b16c6672">

UPDATE [t_lock1] set [c2] = @1 WHERE [c1]=@2 </frame>

     <frame procname="adhoc" line="1" sqlhandle="0x02000000247b290d01bea785a1b6b540fbe5cd7f0c4ff756">

update t_lock1 set c2 = 11 where c1 = 5 </frame>

    </executionStack>

    <inputbuf>

update t_lock1 set c2 = 11 where c1 = 5 </inputbuf>

   </process>

   <process id="process699978" taskpriority="0" logused="216" waitresource="RID: 6:1:181:0" waittime="90" ownerId="4021" transactionname="user_transaction" lasttranstarted="2005-12-28T17:44:45.720" XDES="0x3dcade0" lockMode="U" schedulerid="1" kpid="3748" status="suspended" spid="52" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2005-12-28T17:45:01.080" lastbatchcompleted="2005-12-28T17:44:45.720" clientapp="Microsoft SQL Server Management Studio - Query" hostname="SUNILA2" hostpid="3796" loginname="REDMONDsunila" isolationlevel="read committed (2)" xactid="4021" currentdb="6" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">

    <executionStack>

     <frame procname="adhoc" line="1" stmtstart="38" sqlhandle="0x02000000f276f80843b1ada37f08a824a0226584ce0e87fa">

UPDATE [t_lock2] set [c2] = @1 WHERE [c1]=@2 </frame>

     <frame procname="adhoc" line="1" sqlhandle="0x02000000a7311f2c650673880275b86837c003b590d4df0c">

update t_lock2 set c2 = 10 where c1 = 1 </frame>

    </executionStack>

    <inputbuf>

update t_lock2 set c2 = 10 where c1 = 1 </inputbuf>

   </process>

  </process-list>

  <resource-list>

   <ridlock fileid="1" pageid="143" dbid="6" objectname="general.dbo.t_lock1" id="lock35c6e40" mode="X" associatedObjectId="72057594041139200">

    <owner-list>

     <owner id="process699978" mode="X"/>

    </owner-list>

    <waiter-list>

     <waiter id="process698d48" mode="U" requestType="wait"/>

    </waiter-list>

   </ridlock>

   <ridlock fileid="1" pageid="181" dbid="6" objectname="general.dbo.t_lock2" id="lock35c7040" mode="X" associatedObjectId="72057594041270272">

    <owner-list>

     <owner id="process698d48" mode="X"/>

    </owner-list>

    <waiter-list>

     <waiter id="process699978" mode="U" requestType="wait"/>

    </waiter-list>

   </ridlock>

  </resource-list>

 </deadlock>

</deadlock-list>

 

You can also get a graphical representation of this deadlock by enabling deadlock_graph event. The best way to troubleshoot deadlocks is to avoid one by designing your application appropriately. We touched upon couple of those in the previous example. Here are some general guidelines:

· Use short transactions where possible. Longer running transaction that the locks are held longer thereby increasing the likelihood of deadlock.

· Avoid user interactions within a transaction. User interactions can potentially take a long unbounded amount of time. This can hold up the crucial resources in your application and deny other transactions from proceeding.

· Try at access resources in the same order. Like we described in the example above, if all transactions access resources in the same order, you will minimize deadlocks.

· Run your transactions at lower level of isolation, if possible. This minimizes the duration of the locks held. Note, this is only applicable to SH locks only.

Use row versioning based isolation levels (e.g. SI and RCSI), where possible. Queries executing under these isolation levels don’t acquire any locks on the data. Note, queries will still acquire schema stability lock under these isolation levels, just like it is done for queries running under read-uncommitted isolation level and can get blocked if there is a concurrent DDL operation on the objects referenced in the query