Troubleshooting blocking issues in SQL Server 2000 and 2005

Most of my work is supporting and troubleshooting production environments and very often developers I work with say

“We are having some performance issues in our website and we believe the bottleneck is in SQL Server. We have collected some profilers and some queries take a long time to finish. But if we execute them in SQL Management Studio or Query Analyzer they have small execution durations. I think there is a bug in Sql Server”

Now, whenever I hear this automatically a tiny voice inside my head keeps saying blocking, blocking and blocking.

When I need to troubleshoot this kind of issues I always take the same approach witch is to use SQL Profiler with what I believe to be one of the most valuable stored procedure but very often forgotten. I´m talking about sp_blocker_pss80.

This stored procedure is available at https://support.microsoft.com/kb/271509 and also there are instructions there on how to put it to run. What I’m going to talk about is how to read the output file that it generates and in conjunction with SQL Profiler we can get some clues on what´s blocking our statements.

The first step is to open this file with your favorite text editor (Notepad.exe J ). The structure of this file resembles something like

8.2 Start time: 2008-06-28 09:45:53.263 0

SYSPROCESSES BALEIXO2008 150996343 (2)

 spid status blocked open_tran waitresource

(…)

SYSPROC FIRST PASS

 spid ecid waittype

 ------ ------ --------

(…)

SPIDs at the head of blocking chains (1)

 spid

(…)

SYSLOCKINFO (4)

 spid ecid dbid ObjId IndId Type Resource Mode Status TransID TransUOW

(…)

(3)

**************************************************

Print out DBCC Input buffer for all blocked or blocking spids.

**************************************************

(…)

The numbers above in yellow show the order I usually follow to find my problems. This doesn´t mean it´s the best way or the only one but it´s the one I’m used to and so far got me good results.

The first step is to do a search for SYSLOCKINFO. Above this, is the list of SPIDs at the head of blocking chains (1). If this list is empty I usually move forward to the next SYSLOCKINFO section (doesn´t mean I don´t come back).

In my example I found this

SPIDs at the head of blocking chains

 spid

 ------

     52

So I have a spid 52 that is blocking something. After this I look at my list of SYSPROCESSES (2). It has a lot of columns (not all are shown below) but don´t be scared. A couple of them will give us most of the time the information we need.

SYSPROCESSES BALEIXO2008 150996343

 spid status blocked open_tran waitresource (…)

 ------ ------------------------------ ------- --------- --------------------------

      2 background 0 0

      3 suspended 0 0

      4 background 0 0

      5 background 0 0

      7 background 0 0

      9 background 0 0

     10 suspended 0 0

     11 background 0 0

     12 background 0 0

     13 background 0 0

     52 runnable 0 1

     54 suspended 52 0 KEY: 5:72057594038321152 (010086470766)

So it seems that spid 54 is blocked by spid 52. This means that whatever spid 54 is supposed to do isn´t doing because spid 52 does not allow it. The first columns I usually look are BLOCKED and SPID just to get the picture of what´s not running. The column open_tran is also valuable because will let me know if i´m inside a transaction. Finally the column waitresource will tell me the resource that spid 54 want´s to access but can´t. I will explain later on how to find the name of this resource.

Still in the SYSPROCESSES the next columns I look at for spids I find suspects (52,54) are WAITTIME, CMD and LASTWAITTYPE.

waittype waittime cmd lastwaittype

-------- -------------------- ---------------- -------------------------------

0x0060 327 LAZY WRITER LAZYWRITER_SLEEP

0x007F 101821 LOG WRITER LOGMGR_QUEUE

0x0080 3229 LOCK MONITOR REQUEST_FOR_DEADLOCK_SEARCH

0x009D 11119002 SIGNAL HANDLER KSOURCE_WAKEUP

0x00A2 2230 TRACE QUEUE TASK SQLTRACE_BUFFER_FLUSH

0x00A9 11134508 BRKR TASK BROKER_TRANSMITTER

0x0081 60231 CHECKPOINT CHECKPOINT_QUEUE

0x007E 11135850 TASK MANAGER ONDEMAND_TASK_QUEUE

0x00AD 11118971 BRKR EVENT HNDLR BROKER_EVENTHANDLER

0x00A9 11134524 BRKR TASK BROKER_TRANSMITTER

0x0000 0 UPDATE SOS_SCHEDULER_YIELD (SPID=52)

0x0003 15412 SELECT LCK_M_S                        (SPID=54)

The yellow lines above represent my spid 52 and 54. No wonder my spid 54 is a long runner. It´s waiting for more that 15 seconds. The CMD columns tells me that spid 54 wants to do a SELECT (using the object in WAITRESOURCE) but spid 52 is already performing an UPDATE (using the object in WAITRESOURCE) and does not allow spid 54 proceed. Finally the lastwaittype provides me more details on the last wait type.

You can see details on each type at https://support.microsoft.com/kb/822101 (Description of the waittype and lastwaittype columns in the master.dbo.sysprocesses table in SQL Server 2000 and SQL Server 2005)

Now that I have a picture of what is going on, the next step is to identify my spids. To do this I look at DBCC Input Buffer (3)

**************************************************

Print out DBCC Input buffer for all blocked or blocking spids.

**************************************************

(…)

DBCC INPUTBUFFER FOR SPID 52

 EventType Parameters EventInfo

 -------------- ---------- ----------------------------------------

 Language Event 0 begin transaction

while 1=1

begin

update Employees set FirstName='MyName'

end

commit

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

DBCC INPUTBUFFER FOR SPID 54

 EventType Parameters EventInfo

 -------------- ---------- -----------------------

Language Event 0 select * from Employees

 

In my example I see that spid 54 wants to fetch some rows from table Employees and spid 52 has a transaction and inside this transaction is performing some updates also on this table. This is the place where you will see your stored procedures or something else. Now just a small remark:

Not always we can trust the information in DBCC Input Buffer, and that is one of the reasons I always get a profiler so that if I have any doubt that the statements that show up don´t make any sense I can verify. To do this I look at start time and then look at SQL Profiler to a match to know what was running at the time.

8.2 Start time: 2008-06-28 09:45:53.263 0

Finally in (4) we can see exactly SYSLOCKINFO. This view has more details on what actually is going on and it´s a great source of information to see details. Probably, very often with the information you collected and looking at these statements you will be able to find why they are taking so long.

SYSLOCKINFO

 spid ecid dbid ObjId IndId Type Resource Mode Status TransID

 ------ ------ ------ ----------- ------ ---- ---------------- ------------ ------ --------------

     52 0 5 0 0 DB S GRANT

     54 0 5 0 0 DB S GRANT

     52 0 5 2073058421 1 PAG 1:198 IX GRANT

     54 0 5 2073058421 1 PAG 1:198 IS GRANT

     52 0 5 2073058421 3 PAG 1:202 IU GRANT

     52 0 5 2073058421 1 KEY (06003f7fd0fb) X GRANT

     52 0 5 2073058421 1 KEY (0400b4b7d951) X GRANT

     52 0 5 2073058421 1 KEY (08000c080f1b) X GRANT

     52 0 5 2073058421 1 KEY (0900696fb3a3) X GRANT

     52 0 5 2073058421 1 KEY (0500d1d065e9) X GRANT

     52 0 5 2073058421 0 TAB IX GRANT

     54 0 5 2073058421 0 TAB IS GRANT

     52 0 5 2073058421 1 KEY (07005a186c43) X GRANT

     52 0 5 2073058421 1 KEY (020068e8b274) X GRANT

     52 0 5 2073058421 1 KEY (03000d8f0ecc) X GRANT

     52 0 5 2073058421 3 KEY (9f007d710bca) U GRANT

     52 0 5 2073058421 1 KEY (010086470766) X GRANT

     54 0 5 2073058421 1 KEY (010086470766) S WAIT                

Above we can verify the reason why spid 54 is waiting. First column shows spid and what kind of locks it has acquired.

The column I always look is STATUS. Whenever it says WAIT it rings a bell. Just to verify what we were already thinking spid 54 is waiting to acquire a shared lock (S). Spid 52 has acquired several locks but one is very interesting. It says that spid 52 has a X lock on the same resource that spid 54 is trying to acquire. No wonder spid 54 is suspended.

 

I´ve been talking about accessing objects but so far we only see this strange numbers (2073058421 ) or KEY: 5:72057594038321152 (010086470766). This is the moment I said earlier about discovering this objects. To find them we have to do some queries in SQL Server because our output file does not provide this information.

First of all don´t forget TRACE flag 3604, otherwise the statements using DBCC will execute but you won´t see details.

DBCC TRACEON (3604)

In our case the object has an id 2073058421 (columns objID) . The next steps are performed in SQL Server Studio. To discover the name of this object

select object_name(2073058421)

Employees

In our example the object involved is called Employees. My demo was very simple but in a big stored procedure the name of the objects are very important to help identifying the location of the problem. There are a couple of DBCC commands that can help you. The one I use the most is DBCC PAGE. More information on this at https://support.microsoft.com/kb/q83065/

Just to show you the output of this I will use one of the rows in SYSLOCKINFO

SYSLOCKINFO

 spid ecid dbid ObjId IndId Type Resource Mod

 ------ ------ ------ ----------- ------ ---- ---------------- ---

     52 0 5 0 0 DB S

     54 0 5 0 0 DB S

     52 0 5 2073058421 1 PAG 1:198 IX

     54 0 5 2073058421 1 PAG 1:198 IS

     52 0 5 2073058421 3 PAG 1:202 IU

DBCC Page takes as mandatory arguments dbid and page number.

DBCC TRACEON (3604)

DBCC PAGE(5,1,198) -- (1:198->1,198)

PAGE: (1:198)

BUFFER:

BUF @0x0000000081FFE300

bpage = 0x0000000081F8C000 bhash = 0x0000000000000000 bpageno = (1:198)

bdbid = 5 breferences = 0 bUse1 = 17224

bstat = 0xc0000b blog = 0x3212159 bnext = 0x0000000000000000

PAGE HEADER:

Page @0x0000000081F8C000

m_pageId = (1:198) m_headerVersion = 1 m_type = 1

m_typeFlagBits = 0x4 m_level = 0 m_flagBits = 0x8000

m_objId (AllocUnitId.idObj) = 67 m_indexId (AllocUnitId.idInd) = 256

Metadata: AllocUnitId = 72057594042318848

Metadata: PartitionId = 72057594038321152 Metadata: IndexId = 1

Metadata: ObjectId = 2073058421      m_prevPage = (0:0) m_nextPage = (0:0)

pminlen = 28 m_slotCnt = 9 m_freeCnt = 4941

m_freeData = 4998 m_reservedCnt = 0 m_lsn = (35:414:10)

m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0

m_tornBits = 1107450946

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED

PFS (1:1) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED

execution completed. If DBCC printed error messages, contact your system administrator.

In yellow you can see our object id.

Finally we just have to repeat this process until we identify all the possible bottlenecks in our statements and then optimize.

Just one quick note: usually in my SQL Profiler I collect CPU, Reads, Writes, Text Data, Execution Plan, Transaction ID. This will allow me (after all the hard work identifying misbehaved queries) to begin the improvement process.

Bruno