Deadlock Analysis (Basic)

Today I want to start a small blog series for the deadlock analysis with SQL Server. In the first part we will cover the basics about deadlocks, why and when they occur and how to capture them.
The second part which will be more exciting will cover the basic analysis of deadlocks and the different approaches to solve and to avoid them.
The last part takes the excitement to the extreme and will be an introduction and discussion of some in-depth SQL Scripts, which will help us to analyze and understand the deadlocks in more detail.

Unless otherwise stated the provided information and scripts run on SQL Server 2005 or higher.

What are Deadlocks?

A deadlock occurs, when multiple threads are waiting on each other’s resources. All threads are waiting, for the other thread(s) to release their resources, which they will never do (as they wait for the other resources as well). A deadlock has to contain at least two nodes, but can contain much more than that; the biggest deadlock I saw had 13 nodes.

The most common misconception about deadlocks is that they are a problem of the database system - the database should solve them on its own and should not throw any error to the application. But deadlocks are happening in all database systems that are using locks to guarantee the logical integrity of the data. Deadlock detection and cancellation is a database feature to prevent stale locks, which otherwise would never get resolved by their own.

You can easily create a deadlock with two nodes on SQL Server by the following procedure:

  • Open two connections to the SQL Server (Connection ConA and ConB)
  • On connection ConA prepare a database and a table with some data with the script:

   USE master

GO

IF EXISTS (SELECT * FROM sys.databases WHERE name = N'DL_Analysis')

   DROP DATABASE [DL_Analysis]

GO

CREATE DATABASE [DL_Analysis]

GO

ALTER DATABASE [DL_Analysis] SET RECOVERY SIMPLE

USE [DL_Analysis]

GO

SET ANSI_NULLS ON

SET ANSI_PADDING ON

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [SAPTABLE](

   [NAME] [nvarchar](40) COLLATE SQL_Latin1_General_CP850_BIN2 NOT NULL DEFAULT N' ',

   [SPRSL] [nvarchar](1) COLLATE SQL_Latin1_General_CP850_BIN2 NOT NULL DEFAULT N' ',

   [TEXT] [nvarchar](70) COLLATE SQL_Latin1_General_CP850_BIN2 NOT NULL DEFAULT N' ',

   CONSTRAINT [SAPTABLE~0] PRIMARY KEY CLUSTERED

   (

          [NAME] ASC,

          [SPRSL] ASC

   )ON [PRIMARY]

)ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [SAPTABLE~NAM] ON [SAPTABLE]([NAME]) ON [PRIMARY]

GO

-- Insert two rows of test data

INSERT INTO SAPTABLE([NAME],[SPRSL],[TEXT]) VALUES (N'Miller',N'E',N'English name')

INSERT INTO SAPTABLE([NAME],[SPRSL],[TEXT]) VALUES (N'Schulze',N'D',N'Deutscher Name.')

Go

 

on connection ConA run:

 

 

use [DL_Analysis]

BEGIN TRANSACTION

-- Update the first row, starting with Miller

UPDATE SAPTABLE SET SPRSL = N'D' WHERE NAME = N'Miller' AND SPRSL = N'E'

-- Now run the script for the deadlock partner

-- We will wait in this connection for 1 minute....

WAITFOR DELAY'00:01:00'

-- .. and then will execute the next Update here (now Schulze)

UPDATE SAPTABLE SET SPRSL = N'E' WHERE NAME = N'Schulze' AND SPRSL = N'D'

-- Deadlock occurred.
ROLLBACK -- to be able to repeat it

This transaction will update column SPRSL, where NAME equals to ‘Miller’, than it will wait for one minute and then it will try to update the row, where NAME = ‘Schulze’

  • While the transaction on connection ConA is waiting, run this code on connection ConB:

use [DL_Analysis]

BEGIN TRANSACTION

-- Run both updates at once, starting with Schulze

UPDATE SAPTABLE SET SPRSL = N'F' WHERE NAME = N'Schulze' AND SPRSL = N'D'

UPDATE SAPTABLE SET SPRSL = N'F' WHERE NAME = N'Miller' AND SPRSL = N'E'

-- Deadlock occurred.

ROLLBACK -- to be able to repeat it

It will update the row where NAME = ‘Schulze’ and then the row where NAME = ‘Miller’. It can update the row ‘Schulze’, but has to wait on the lock on row ‘Miller’, as the request on connection ConA has already changed it and has an exclusive lock on it.

  • Once the waiting on connection ConA is over, the request will try to update the row ‘Schulze’, but has to wait, as the request on connection ConB has already changed it and has an exclusive lock on it.
  • Now we have a deadlock, the request ConA has changed ‘Miller’ and has a lock (X lock) on it, but has to wait on ‘Schulze’, which is blocked by ConB. Connection ConB has an X lock on ‘Schulze’ and has to wait for to lock on ‘Miller’, which is held by ConA.
  • The SQL Server is able to detect such deadlocks and will resolve it by killing the transaction with the least amount of generated log, the transaction on connection ConA in our case. The transaction is ended and rolled back with an error 1205:

Msg 1205, Level 13, State 51, Line 3 Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

  • The transaction on connection ConB can continue.

How to capture deadlocks?

 

When a deadlock occurs in a SAP system, we do not get much information in the ST22 dumps or within the SQL Server error log. The database monitor of the SAP system (Transaction DBACOCKPIT) shows the occurred deadlocks, the table on which the deadlock occurs and the work process information. This is only done for the ABAP part of the system (no Java deadlocks are shown here) and it shows only the victims; there is no information about the winning processes.

clip_image002

SQL Profiler trace

Therefore SAP provides a note (32129 (login required)) to setup a more detailed deadlock tracing for SQL Server. This note sets several SQL Server trace flags and configures a SQL Server Profiler Trace.

The trace flags that get set are:

Trace flag

Description

Purpose

1204

Deadlock victim/chain output

This trace flag produces detailed information about deadlocks, including participants and victims.

1222

XML Deadlock Information to the error log

This trace flag pipes XML Deadlock information to the error log

3605

Send trace output to the error log

This trace flag is necessary to get the above debug information into the error log. If it is not set, the other trace flags will not be able to write to the SQL Server Error log.

For the background profiler trace you have to execute the script that is attached to the note. It will create a profiler trace to a local file in your SQL Server log directory (where the Error Log lives), which will only contain the deadlock event of the SQL Server events. The trace will be registered as a startup trace and will be active instantly.

Once all of this (SQL Profiler Trace and Trace flags) is in place, one has to wait, that the deadlock occurs again. When this happens, a deadlock graph is written to the trace file and as XML written to the error log. These two files we need for a first deadlock analysis.

When you open the deadlock trace file (e.g. deadlock01_20130201-120016.trc) in the SQL Server Profiler, you can examine the deadlock in a graphical format:

clip_image004

You can identify the two nodes and some of the locks they held.
Each thread is represented with an oval; the one which is crossed is the one that was chooses as deadlock victim. The other one is the deadlock winner. The ovals are showing more information about the thread:

Line

Content

Server process id

The Session ID (SPID) of the request

Server batch id

An internal reference number for the batch, where the statement is running in. In an SAP system it is zero.

Execution context id

When the query is executed in parallel, this shows the ID of one of the parallel running threads. In an SAP system it is mostly zero.

Deadlock priority

If the thread has an associated deadlock priority, it is shown here. SAP does not use deadlock priority.

Log Used

The amount of transaction log, that was already generated by this thread

Owner ID

An internal reference number for the transaction.

Transaction descriptor

An internal reference number for the state of the transaction.

 

In the middle of the graph, in the rectangles, information about the two participating locks is shown:

 

Line

Content

Locktype

Type of lock the thread holds or requests. Common values are “Key Lock”, “Page Lock” or ”RID Lock”

HoBt ID

Heap or B-Tree ID, the ID of the internal allocation structure of the table or partition

Associated ObjId

The ObjectID that is associated to the HoBt ID. The name of the object can be retrieved via the OBJECT_NAME function.

Index name

The name of the index, on which the locks are held or requested.

The lines between the threads and locks are representing the held and requested locks. In our example the right thread (Winner) holds an X lock (Owner Mode: X) on the upper lock and requests a U lock (Request Mode: U) to the lower lock. The victim thread holds the X lock on the lower lock and requests a U-lock for the upper lock. Following the lines you can circle around between the ovals and will never come to an end – a deadlock occurred.

We now have information about the threads and the locks, but what is about the statement that causes all this? Where can I find it?

It is revealed when you hover over the left and right ovals:

clip_image006

In our example it just shows the script we ran before.

You can extract all information about the deadlock from the SQL Server deadlock trace in XML format. Therefore right click on the line with the deadlock and choose “Extract event data”:

clip_image008

You then can save this event as an XDL file (SQL Server Deadlock File). When you double click the file it gets opened it the SQL Server Management Studio and shown in a graphical format (as in the SQL Profiler). But when you open it with a text editor (e.g. notepad) it is opened as an XML file.

I’d like to describe the content of this XML file in more detail, as it contains quite useful information about the deadlock and the participating processes. The XML has a structure like:

XML Tag

Description

Deadlock-list

The surrounding bracket around all deadlocks (most of the time only one) deadlocks

     Deadlock

First deadlock, with Victim Process ID

         Process-list

The list of involved Processes (most of the time only two)

              Process

Process number one, the victim. This section contains many attributes about the Process itself, they will be discussed later

              ExecutionStack

The statement that causes the deadlock, which SQL Handle and (sometimes) as a prepared statement.

                   Inputbuffer

The complete input buffer of the process, in an SAP system most of the time only one statement, in our example the complete script of the connection. Has the statement from the ExecutionStack included.

              Process

Process number two, the survivor or winner of the deadlock

              ExecutionStack

The same information as for the victim.

                   Inputbuffer

The same information as for the victim.

Resource-List

All the resources that are involved in the deadlock. Mostly only a list of two KEY locks, but might be other as well (e.g. PAGE locks)

     Keylock/PageLock

Detailed information about the first lock itself, they will be discussed later.

         Owner-List

The list of owner of this resource, mostly one process.

              Owner

The owner with process id and lock mode (X, U, S etc.)

         Waiter-List

The list of waiter of this resource; can be more than one.

              Waiter

The waiter with process id and lock mode (X, U, S etc.) and request type (WAIT).

     Keylock/PageLock

Detailed information about the second lock.

         Owner-List

The same information as for the first lock.

              Owner

The same information as for the first lock.

         Waiter-List

The same information as for the first lock.

              Waiter

The same information as for the first lock.

Here is an example (fragment), how this looks like in our case:

XML Tag

Example

Deadlock-list

 

     Deadlock

victim="process3f9f4c8"

         Process-list

 

              Process

id="process3f9f4c8" taskpriority="0" logused="684" waitresource="KEY: 11:72057594038779904 (7e576f73234b)" waittime="2185" ownerId="286942698" transactionname="user_transaction" lasttranstarted="2013-02-01T12:00:44.620" XDES="0x86dab950" lockMode="U" schedulerid="1" kpid="3640" status="suspended" spid="59" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2013-02-01T12:00:44.537" lastbatchcompleted="2013-02-01T11:51:13.350" clientapp="Microsoft SQL Server Management Studio - Query" hostname="MyHostname" hostpid="4868" Loginname="MyDomain\User" isolationlevel="read committed (2)" xactid="286942698" currentdb="11" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200"

         ExecutionStack

<frame procname="adhoc" line="13" stmtstart="110" sqlhandle="0x0200000060960b1fa1d4f597f9ad060adbabb997bfa96045"> UPDATE [SAPTABLE] set [SPRSL] = @1 WHERE [NAME]=@2 AND [SPRSL]=@3 </frame> <frame procname="adhoc" line="13" stmtstart="724" sqlhandle="0x0200000047ca5e0e3acdcd8f7c7dc91d718edf54ac1624ac"> UPDATE SAPTABLE SET SPRSL = N&apos;E&apos; WHERE NAME = N&apos;Schulze&apos; AND SPRSL = N&apos;D&apos; -- Deadlock occured </frame>

                   Inputbuffer

use [DL_Analysis] BEGIN TRANSACTION -- Update the first row, starting with Miller UPDATE SAPTABLE SET SPRSL = N&apos;D&apos; WHERE NAME = N&apos;Miller&apos; AND SPRSL = N&apos;E&apos; -- Now run the script for the deadlock partner -- we will wait in this connection for 10 seconds .... WAITFOR DELAY &apos;00:00:10&apos; -- .. and then will execute the next Update here (now Schulze) UPDATE SAPTABLE SET SPRSL = N&apos;E&apos; WHERE NAME = N&apos;Schulze&apos; AND SPRSL = N&apos;D&apos; -- Deadlock occured

Resource-List

 

     Keylock/PageLock

hobtid="72057594038779904" dbid="11" objectname="DL_Analysis.dbo.SAPTABLE" indexname="SAPTABLE~0" id="lockd047c00" mode="X" associatedObjectId="72057594038779904"

         Owner-List

 

              Owner

id="process7041708" mode="X"

         Waiter-List

 

              Waiter

id="process3f9f4c8" mode="U" requestType="wait"

     Keylock/PageLock

hobtid="72057594038779904" dbid="11" objectname="DL_Analysis.dbo.SAPTABLE" indexname="SAPTABLE~0" id="lock28b09180" mode="X" associatedObjectId="72057594038779904"

         Owner-List

 

     Owner

id="process3f9f4c8" mode="X"

         Waiter-List

 

              Waiter

id="process7041708" mode="U" requestType="wait"

The process attributes are (from the example) :

Attribute

Description

id="process3f9f4c8"

The ID of the process

taskpriority="0"

The priority of the task, most of the time 0.

logused="684"

The amount of transaction log that was generated through this transaction. The process with the least amount will be the victim.

waitresource="KEY: 11:72057594038779904 (7e576f73234b)"

The resource, the process is waiting on. It is defines as Type (RID, KEY, PAGE etc.):DatabaseID:AllocationUnitID(HashValue of the Key)

waittime="2185"

How long has the process waited on this resource?

ownerId="286942698"

Internal number of the transaction.

transactionname="user_transaction"

The name or type of transaction

lasttranstarted="2013-02-01T12:00:44.620"

When did the last transaction started?

XDES="0x86dab950"

 

lockMode="U"

The lock mode the process is waiting for.

schedulerid="1"

The CPU this process is running on.

kpid="3640"

The Window Kernel Thread ID

status="suspended"

The status of the process (suspended, as it is waiting for the lock).

spid="59"

The SQL Server Session ID

sbid="0"

The System BatchID or Request ID

ecid="0"

The Execution Context, when executed in parallel, otherwise Zero.

priority="0"

The deadlock priority of the process.

trancount="2"

How many nested transaction has the process open?

lastbatchstarted="2013-02-01T12:00:44.537"

When did the last batch started?

lastbatchcompleted="2013-02-01T11:51:13.350"

When did the last batch completed?

clientapp="Microsoft SQL Server Management Studio - Query"

The name of the client application.

hostname="MyHostname"

The hostname where the client application is running on.

hostpid="4868"

The Window Process ID of the application

loginname="MyDomain\MyUser"

The user that is connected to the SQL Server.

isolationlevel="read committed (2)"

The transaction isolation level of the process.

xactid="286942698"

Transaction ID

currentdb="11"

Database ID

lockTimeout="4294967295"

The maximum lock timeout.

clientoption1="671090784"

The client options that are set (e.g. XACT_ABORT, ANSI_NULLS etc.)

clientoption2="390200"

The client options that are set (cont).

The resource attributes are (from the example):

Resource Atrribute

Description

hobtid="72057594038779904"

The Heap-or-B-Tree ID, the internal ID for the allocation unit of the table

dbid="11"

The database ID

objectname="DL_Analysis.dbo.SAPTABLE"

The object

indexname="SAPTABLE~0"

The index the resource belongs to

id="lockd047c00"

The internal representation of the lock.

mode="X"

The requested or owned lock mode

associatedObjectId="72057594038779904"

The object id of the table, most of the time identical with the HoBtID. Only when the table is partitioned the two are different (one ObjectID has then multiple HoBtID).

In the next parts of the deadlock blog series we will make use of some of these attributes.

SQL Server Error Log File

When the trace flags are set correctly, you will get the exact same information as in the XML file in the SQL Server error log file (see this fragment):

2013-02-01 13:24:23.430 spid9s Deadlock encountered .... Printing deadlock information

2013-02-01 13:24:23.430 spid9s Wait-for graph

2013-02-01 13:24:23.430 spid9s NULL

2013-02-01 13:24:23.430 spid9s Node:1

2013-02-01 13:24:23.430 spid9s KEY: 11:72057594038779904 (7e576f73234b) CleanCnt:2 Mode:X

2013-02-01 13:24:23.430 spid9s Grant List 2:

2013-02-01 13:24:23.430 spid9s Owner:0x000000001E5D4780 Mode: X Flg:0x40 Ref:0
Life:02000000 SPID:58 ECID:0 XactLockInfo
0x0000000099191400

2013-02-01 13:24:23.430 spid9s SPID: 58 ECID: 0 Statement Type: UPDATE Line #: 6

2013-02-01 13:24:23.430 spid9s Input Buf: Language Event: use [DL_Analysis]

                                   BEGIN TRANSACTION

                                   -- Run both Updates at once, starting with Schulze

                                   UPDATE SAPTABLE SET SPRSL = N'F' WHERE NAME = N'Schulze'

                                   UPDATE SAPTABLE SET SPRSL = N'F' WHERE NAME = N'Miller' A

2013-02-01 13:24:23.430 spid9s Requested by:

2013-02-01 13:24:23.430 spid9s ResType:LockOwner Stype:'OR'Xdes:0x00000000071299C0 Mode: U
SPID:59 BatchID:0 ECID:0 TaskProxy:(0x0000000034AB8540)
Value:0x9845dfc0 Cost:(0/684)

2013-02-01 13:24:23.430 spid9s NULL

2013-02-01 13:24:23.430 spid9s Node:2

2013-02-01 13:24:23.430 spid9s KEY: 11:72057594038779904 (86c3d23ec34f) CleanCnt:2 Mode:X

2013-02-01 13:24:23.430 spid9s Grant List 0:

2013-02-01 13:24:23.430 spid9s Owner:0x000000008913FB80 Mode: X Flg:0x40 Ref:0
Life:02000000 SPID:59 ECID:0 XactLockInfo: 0x0000000007129A0

2013-02-01 13:24:23.430 spid9s SPID: 59 ECID: 0 Statement Type: UPDATE Line #: 13

2013-02-01 13:24:23.430 spid9s Input Buf: Language Event: use [DL_Analysis]
..

 

System health X-Event Session

If you are on SQL Server 2008 or higher you can select the XML directly from the internal SystemHealth X-Event session:

select

  XEventData.XEvent.value('(.)[1]/@timestamp','datetime')as[DateAndTime],

  cast(XEventData.XEvent.value('(data/value)[1]','nvarchar(max)')asXML)asDeadlockGraph

FROM

       (selectCAST(target_dataasxml)asTargetData

       fromsys.dm_xe_session_targetsstjoin

       sys.dm_xe_sessionssons.address=st.event_session_address

       wherename='system_health')ASData

       CROSSAPPLYTargetData.nodes('//RingBufferTarget/event')ASXEventData(XEvent)

whereXEventData.XEvent.value('@name','nvarchar(4000)')='xml_deadlock_report'
 

clip_image009


DBACOCKPIT (newer releases)

In newer releases of the SAP Database Monitor you can directly access the XML from within the transaction DBACOCKPIT:

 

clip_image010

clip_image011

Summary

 

Today we discussed what deadlocks are, when they occur and how to get more information about them. Using a basic and fast approach we have collected as much information as possible. In the next part of the deadlock series, I will show how to use this information in basic analysis methods and describe some ways to solve or to avoid the deadlocks.