Analyzing a HANGMAN log file, part 2

The SAP and Windows specific sections of a HANGMAN log file have already been explained in a previous blog entry. Today we want to have a look on the SQL Server specific sections. For the explanation I will use the new terms session/connection/request introduced in SQL Server 2005. However, you may still see the old terms connection/SPID in a HANGMAN log file for SQL Server 2000.


Blocking SQL process list
This list contains all blocking and blocked SQL sessions. The following example is from an SAP JAVA system running on SQL Server 2000. The output of HANGMAN is different on SQL 2000 compared to SQL Server 2005 or newer (due to the different architecture and system tables). For all SQL Server versions you can get the blocked SQL statements and the current SQL statement running on the blocking session. Exclusive locks are held until the end of a transaction, which may consists of many SQL statements. Therefore the current running SQL statement in the blocking session may not necessarily be the SQL statement holding the blocking lock(s). This is a common misunderstanding at customer sites and even sometimes in SAP support. It may even be the case that there currently is no running SQL statement on the blocking session. The only thing a session needs to block another is an open transaction and an lock which conflicts with lock requests of other transactions.


The following list is an extract from a HANGMAN log file of a SQL Server 2000 system. All further lists are extracted from the attached HANGMAN log file of a SQL Server 2005 system.
spid ecid blocked wait[ms] input_buffer   stmt_text
—- —- ——- ——– ————– —————————-
 153    0       0        0 sp_execute;1   –>SELECT “VSE”.”OBJECTID”,”
 144    0     153  3313813 sp_prepexec;1  —->SELECT VSE.OBJECTID, VO
 123    0     144    73797 sp_prepexec;1  ——>SELECT VSE.OBJECTID,
 139    0     144  1873797 sp_prepexec;1  ——>SELECT VSE.OBJECTID,
 143    0     144  2773813 sp_prepexec;1  ——>SELECT VSE.OBJECTID,


The input buffer shows the last statement sent to SQL Server on this particular session. Having a look at the input buffer in the example above you can only see a “sp_execute”. This means that the SQL statement is a prepared statement. It does not tell you which SQL statement is running. SAP always uses prepared statements for JAVA. For ABAP prepared statements are used in certain cases as of NetWeaver release 6.20 and in (almost) all cases as of NetWeaver release 7.00 running against SQL Server 2005 and 2008. To figure out the SQL statement you have to look at the text column (“stmt_text” or “sql_text”). In some cases you will not see the SQL statement. There could simply be no running SQL statement on the blocking session (while there is always a running SQL statement on the blocked session). In rare cases you will not see the SQL statement text while the optimizer is creating an execution plan for this statement.


Blocking SQL process list of a SQL Server 2005 system:
session blocking wait
   id      id    sec  wait_type   wait_resource sql_text
——- ——– —- ———– ————- —————————–
   81       0    NULL NULL        NULL          > delete from BLK_TEST3
   86      81     425 LCK_M_U     KEY: 5:720575 –> delete from BLK_TEST3 whe
   87      81     423 LCK_M_X     KEY: 5:720575 –> insert into BLK_TEST1 val
   89      87     422 LCK_M_SCH_S OBJECT: 5:212 —-> (@P0 int,@P1 int)begin
   95      89     418 LCK_M_SCH_S OBJECT: 5:212 ——> (@P0 int,@P1 int)begi
   97      89     417 LCK_M_SCH_S OBJECT: 5:212 ——> (@P0 int,@P1 int)begi
  101      89     415 LCK_M_SCH_S OBJECT: 5:212 ——> (@P0 int,@P1 int)begi
  102      89     414 LCK_M_SCH_S OBJECT: 5:212 ——> (@P0 int,@P1 int)begi
   88      81     423 LCK_M_X     KEY: 5:720575 –> insert into BLK_TEST1 val
   90       0     414 WAITFOR                   > WAITFOR DELAY ’20:00′
   92      90     419 LCK_M_S     KEY: 5:720575 –> select * from BLK_TEST3 w
  104      92     412 LCK_M_S     KEY: 5:720575 —-> select * from BLK_TEST3
   93      90     419 LCK_M_S     KEY: 5:720575 –> select * from BLK_TEST3 w
  100      90     416 LCK_M_S     KEY: 5:720575 –> select * from BLK_TEST3 w


For SQL Server 2005 and more recent releases the blocking process list also contains the wait type, wait resource and the wait time in seconds. A blocking situation as such is not an issue. It may become a problem once it remains for many seconds or minutes and starts blocking more and more requests. In the example above you can see two independent blocking situations. To figure out who blocks whom you can either lock at the session ID and blocking ID or at the indentation (–>) of the SQL statements.  Let’s have a look on the first blocking situation caused by session ID 81. This session is not blocked by anyone else (blocking ID = 0), but blocks sessions 86, 87 and 88. Session 87 blocks session 89 which in turn blocks the sessions 95, 97, 101 and 102. HANGMAN recursively sorts the hanging SQL process list, which makes it easy to identify the independent blocking trees just by looking at the indentation of the SQL statements.


Some people confuse a blocking situation with a deadlock. A blocking situation happens if session B waits for a lock which is hold by session A. Such a blocking situation resolves without further interaction at the point in time session A releases the blocking lock. The blocking situation turns into a deadlock once session A waits for another lock which is hold by session B. SQL Server automatically detects such a deadlock and automatically resolves it by terminating one of the two sessions. In an SAP system you typically see an ABAP short dump as the result of a deadlock. SAP configures the update work processes to be always the victim of a deadlock. The SAP update work process can repeat a deadlocked database transaction without any business impact. Therefore you will not see a short dump or any other error in the SAP system log once a deadlock occurs on an update work process.


SQL process list
For SQL Server 2005 and more recent releases this section consist of 2 independent lists. The first list is actually a join of the SQL Server sessions (S), connections (C) and requests (R) retrieved from the SQL Server dynamic management views. This list is ordered by the SAP work process number (using the program name).  It contains the recent SQL statement which was executed on the connection, even when there is currently no active request. The second list contains all SQL requests ordered by the corresponding session ID. This list contains the current SQL statement which is being executed.


For an ABAP system, SAP sets the program name when connecting to SQL Server, for example “R3D00 comm rd ODBC”. The first 2 characters of the program name are “R3” followed by the work process type (“D” for dialog work process, “B” for batch, “U” for update, …). The next two characters show the work process number as you can see in the SAP work process list. The following characters indicate the SQL Server isolation level (“comm rd” for committed read, “unc rd” for uncommitted read) and the used database interface (“ODBC” or “OLEDB”). You should be aware that the type of an SAP work process may change without reconnecting to the database (when changing SAP’s operation mode). Therefore the type displayed in the application name is not always reliable. However, the work process number does not change.


A session is in status running if the work process which opened the session has sent an SQL statement to SQL Server and is waiting for the results. For running sessions a request exists which is using one of SQL Servers worker threads. A sleeping session simply means that there is still an open session, but SQL Server has nothing to do for this session. A sleeping (user) session has no corresponding request.
The number of threads which can concurrently run on Windows depends on the installed CPUs. This limits also the number of SQL Server requests which can be in status ‘running’. For a two processor, dual core, non-hyperthreading machine there can only be 4 requests in status running at one point in time. When looking into a HANGMAN log file one running request always comes from HAMGAN itself, respectively OSQL.EXE. It is consequential that the time the SQL process list is retrieved, it is the retrieving request which runs.


ID  Host  Program  OS    Status  delta delta delta             Status
(S) (S)   Name (S) Proc  (S)     CPU(R)Reads LogReads delta    (R)
— —– ——– —– ——- —– —– ——– ——– ——-
  1                      sleepin 0     0     0        1000 ms  backgroun
  2                      sleepin 0     0     0        1000 ms  backgroun
 86 WSI64          0     running 0     0     0        1000 ms  suspended
 87 WSI64          0     running 0     0     0        1000 ms  suspended
 88 WSI64          0     running 0     0     0        1000 ms  suspended
 89 WSI64          0     running 0     0     0        1000 ms  suspended
 95 WSI64          0     running 0     0     0        1000 ms  suspended
 96 WSI64          0     sleepin
 97 WSI64          0     running 0     0     0        1000 ms  suspended
101 WSI64          0     running 0     0     0        1000 ms  suspended
102 WSI64          0     running 0     0     0        1000 ms  suspended
 81 WSI64 Microsof 2920  sleeping
106 WSI64 OSQL-32  6808  running 15    0     291      1000 ms  running
 61 WSI64 R3D00 co 3080  sleepin
 62 WSI64 R3D02 co 3096  sleepin
 72 WSI64 R3D02 un 3096  running 0     -65   -1356    1000 ms  suspended
 66 WSI64 R3D03 un 3104  sleepin
 56 WSI64 R3D04 co 2540  running 0     1     3


Similar to the Windows process list, a delta of CPU usage, physical reads/writes and logical reads is calculated for each request. The time frame used for the delta values is displayed in a separate column. The currently running hangman/osql request in the example above was consuming 15ms CPU time within the last 1000ms. In this time frame 0 physical reads and 291 logical reads (data buffer accesses) were performed. In rare cases you might see negative values which can be explained with inaccurate values in the SQL Server dynamic management views.


In the section “Blocking SQL process list” above we have seen a blocking situation caused by session 81. The first thing for analyzing this is to have a look at the SQL process list and figure out what is going on in this session. The first surprise is that session 81 is sleeping. Therefore there is no corresponding request which could consume CPU time. Secondly session 81 was opened by the program “Microsoft SQL Server Management Studio”. This blocking situation is therefore not caused by any SAP report.


If a blocking session was opened by a SAP work process then the session could also be sleeping. The SAP work process might be busy with executing ABAP calculations while an open database transaction persists. For further analysis you should check what the SAP work process is doing, in particular whether the work process consumes CPU or not. To figure out which work process opened the session you can use the server name “Host(S)” and the SAP work process number as part of “Program Name(S)”. Alternatively you can also use the Windows process ID from the column “OS Proc(S)”. Unfortunately you can see neither the program name nor the Windows process ID for a JAVA system.


SAP uses multiple active result sets (MARS) of SQL Server 2005 and newer. Therefore you always have at least two SQL connections per SQL session. The first connection is internally used for MARS and will never be related to any SQL request. That’s the reason why HANGMAN does not show this first SQL connection per session in the SQL process list.


Database lock list
The list of all held and requested locks at any given point in time is one of the most powerful features of HANGMAN. For performance reasons (of HANGMAN) the list may be incomplete. However, this is typically not the case when there are less than a few hundred of thousand locks. The list is definitely complete if it ends with the number of locks in the list (“xxx rows affected”). In addition to the object ID you can see the object name (table name) for each database lock. Since the translation form object ID to object name can be very time-consuming, it is only performed for the database HANGMAN is connected to. The database locks of other databases are also contained in the list, but without the object name.


The locks are ordered by the session ID which is holding (status = “G” for GRANTED) or requesting (status = “WAIT”) a database lock. The lock granularity of SQL Server can be a row, a page or a table (type = “OBJECT”). Row locks are either row IDs (type = “RID”) or a hash value calculated on the clustered index key (type = “KEY”). Since almost all tables in SAP have a primary clustered index you will see lots of KEY locks and very seldom PAGE or OBJECT locks in a HANGMAN log file. SQL Server has a special lock mode, called intent locks (mode = “IX”, “IS”, “IU”, …). Intent locks are an indication for an existing lock with lower granularity. For example, a shared lock (“S”) on a row results in an intent shared lock (“IS”) on the page and on the table. The lock list of HANGMAN does not contain granted intent locks, but it contains all waits on intent locks.


In our example above we have seen a blocking situation caused by session 81. Let’s have a closer lock on the related locks: Sessions 87 is requesting an exclusive row lock (KEY 0500d1d065e9) on index ID 1 of table BLK_TEST1. The blocking session 81 is holding an exclusive lock on exactly this row. Hereby we have proven that this works as designed. There are no page or table locks involved. It is not the lock granularity of SQL Server but the application logic, which caused the blocking situation.


session                        db ind
   id   object id  object name id id  type     mode  status description
 —— ———- ———– — — ——– —-  —— ————–
 81     .          .           5  .   METADATA Sch-S G      principal_id =
 81     .          .           5  .   METADATA Sch-S G      schema_id = 6
 81     2060495065 BLK_TEST1   5  1   KEY      X     G      (020068e8b274)
 81     2060495065 BLK_TEST1   5  1   KEY      X     G      (010086470766)
 81     2060495065 BLK_TEST1   5  1   KEY      X     G      (0400b4b7d951)
 81     2060495065 BLK_TEST1   5  1   KEY      X     G      (03000d8f0ecc)
 81     2060495065 BLK_TEST1   5  1   KEY      X     G      (0500d1d065e9)
 81     41011873   BLK_TEST3   5  1   KEY      X     G      (010086470766)
 86     .          .           5  .   METADATA Sch-S G      principal_id =
 86     .          .           5  .   METADATA Sch-S G      schema_id = 6
 86     41011873   BLK_TEST3   5  1   KEY      U     WAIT   (010086470766)
 87     .          .           5  .   METADATA Sch-S G      schema_id = 6
 87     .          .           5  .   METADATA Sch-S G      class = 0, maj
 87     .          .           5  .   METADATA Sch-S G      principal_id =
 87     .          .           5  .   RID      X     G      1:334232:0
 87     2060495065 BLK_TEST1   5  1   KEY      X     WAIT   (0500d1d065e9)
 87     2124495293 BLK_TEST2   5  1   PAGE     X     G      1:334232
 87     2124495293 BLK_TEST2   5  1   PAGE     X     G      4:333927
 87     2124495293 BLK_TEST2   5  .   OBJECT   Sch-M G
 87     5          sysrowsets  5  1   KEY      X     G      (0000ca0144f3)
 87     7          sysallocun  5  1   KEY      X     G      (0000c96b563e)
 88     .          .           5  .   METADATA Sch-S G      schema_id = 6
 88     .          .           5  .   METADATA Sch-S G      principal_id =
 88     2060495065 BLK_TEST1   5  1   KEY      X     WAIT   (0400b4b7d951)
 89     .          .           5  .   METADATA Sch-S G      schema_id = 6
 89     2124495293 BLK_TEST2   5  .   OBJECT   Sch-S WAIT
 95     .          .           5  .   METADATA Sch-S G      schema_id = 6
 95     2124495293 BLK_TEST2   5  .   OBJECT   Sch-S WAIT
 97     .          .           5  .   METADATA Sch-S G      schema_id = 6
 97     2124495293 BLK_TEST2   5  .   OBJECT   Sch-S WAIT
 101    .          .           5  .   METADATA Sch-S G      schema_id = 6
 101    2124495293 BLK_TEST2   5  .   OBJECT   Sch-S WAIT
 102    .          .           5  .   METADATA Sch-S G      schema_id = 6
 102    2124495293 BLK_TEST2   5  .   OBJECT   Sch-S WAIT

(11154 rows affected)


Session 87 is holding a schema modification lock (mode = “Sch-M”) on table BLK_TEST2. This lock mode is used when changing the table structure, for example when creating or dropping indexes. It is also used when truncating a table. Session 89 simply wants to select data from table BLK_TEST2 and therefore requests a schema stability lock (mode = “Sch-S”) for the duration of the execution. This lock only can be granted once the schema modification lock held by session 87 on the same table is released. Sessions 95, 97, 101 and 102 also request a schema stability lock on the same table. Logically they also wait for session 87. However, due to the internal monitoring structures of SQL Server they wait on session 89, which in turn waits on session 87. Anyway, schema stability locks can be granted to multiple sessions at the same time.


SQL Server statistics
HANGMAN adds at the end of its log file additional SQL Server statistics regarding schedulers, memory usage, cache hit ratio and the SQL Server Errorlog. The most interesting part is probably the I/O response time per file. In the attached example the response time is horribly bad. Currently HANGMAN calculates the response time as an average since SQL Server startup. There is no delta calculation on the last seconds as it is the case for the CPU usage in the SQL process list. This feature will be added in a future version of HANGMAN.


db   file ms per ms per database path
id   id   read   write
—- —- —— —— ——– ——————————
  1   1   12.84  58.00  master   C:\Program Files\Microsoft SQL
  1   2   16.82  25.00  master   C:\Program Files\Microsoft SQL
  2   1   36.39  200.12 tempdb   E:\TEMPDB\tempdb.mdf
  2   2   17.11  23.90  tempdb   E:\TEMPDB\templog.ldf
  3   1   13.30  24.50  model    C:\Program Files\Microsoft SQL
  3   2   20.86  34.38  model    C:\Program Files\Microsoft SQL
  4   1   14.41  32.66  msdb     C:\Program Files\Microsoft SQL
  4   2   18.13  10.78  msdb     C:\Program Files\Microsoft SQL
  5   1   129.47 136.71 SR3      E:\SR3DATA1\SR3DATA1.mdf
  5   2   35.50  32.20  SR3      E:\SR3LOG1\SR3LOG1.ldf
  5   3   136.23 159.89 SR3      E:\SR3DATA2\SR3DATA2.ndf
  5   4   144.60 151.73 SR3      E:\SR3DATA3\SR3DATA3.ndf
  5   5   154.84 149.97 SR3      E:\SR3DATA4\SR3DATA4.ndf


If you want to have a look at the complete HANGMAN log file of our example then just open the attached file HANGMAN__2008-10-07__06-15-37__650.log.


 


 

HANGMAN__2008-10-07__06-15-37__650.log