SAP DBACockpit and some related SQL Scripts (Part 3)

Well, let’s go to the last part of our series about SAP DBACockpit and associated SQL Server scripts.

Actual Running Requests

Another area which usually is investigated is to detect long running queries against the database. What can be from interests are questions like whether a requests is blocked in any way or experiences wait situations on some kind of resources and more important how the statement looks like. Especially in the case of SAP applications running against SQL Server, the interesting question as well is to trace these requests back to a specific SAP application server and the specific process the request is issued from. A query to get this information from would be:

–Query1

select r.session_id, s.host_name, s.program_name, s.host_process_id,

r.status, r.wait_time,wait_type,r.wait_resource,

substring(qt.text,(r.statement_start_offset/2) +1,

(case when r.statement_end_offset = -1

then len(convert(nvarchar(max), qt.text)) * 2

else r.statement_end_offset end -r.statement_start_offset)/2)

as stmt_executing,r.blocking_session_id, r.cpu_time,r.total_elapsed_time,r.reads,r.writes,

r.logical_reads, r.plan_handle

from sys.dm_exec_requests r

cross apply sys.dm_exec_sql_text(sql_handle) as qt, sys.dm_exec_sessions s

where r.session_id > 50 and r.session_id=s.session_id

order by r.session_id, s.host_name, s.program_name, r.status

 

This query will answer most of the questions above and will show the statement as well as wait or blocking states and the SAP process issuing the query. Additionally you’ll find the plan handle at the end of the row. This allows getting to the execution plan as described in the first part of the series. What are we expecting to see as results? In the column status, we will see quite a few requests which are in the status ‘suspended’ and some others (like Query1 above) as ‘running’. The state ‘suspended’ simply means that the request currently is not active because it is waiting on a resource. The resource can be an I/O for reading a page, it can be communication on the network, or it simply that other requests are using the CPUs and the request is in the list of the scheduler to become ‘running’ again. So no real worries to see status suspended here. The resource the request is waiting on is described in the column wait_type. Expectations are that one sees some resources listed like PAGEIOLATCH_SH indicating disk I/O activity or ASYNCH_NETWORK_IO indicating network communication. It is normal to see those resources. To differentiate the good from the bad in disk I/O activity, one would need to look at the column wait_time. The lower this time, the better since this time mainly reflects the performance of the I/O subsystem. The higher the number the slower the I/O subsystem seems to be. Well or the more data is read per I/O. Therefore one should not take the wait_time as a direct time in milliseconds an I/O needs to be executed. To get the real average time an I/O takes one should use Windows performance Monitor. But the number gives a good indication and direction in which direction to investigate. For ASYNC_NETWORK_IO, we usually see higher numbers happening quite often. These higher numbers indicate as well that larger amounts of data are being transferred from the server to the client. This is normal even in a SAP ERP environment and wait_times of a few seconds here and there can happen, however should happen with the majority of requests. Blocking Lock situations also will show up with a wait_type of LCK….. In such a case the wait_time gives the length the request got blocked. More information about the different wait types can be found in a paper my colleague Tom Davidson wrote

 

Monitoring requests waiting on locks

We hardly expect to see Shared Locks on SQL Server while used by SAP Applications since SAP applications are reading mostly in uncommitted isolation level. Hence most of the locks we get to see are locks which indicate data modifications in form of update, inserts or deletes. Also the lock granularity mostly experienced is row level locks either on the clustered index (mostly on the data level) or row locks on non-clustered index rows. Running Query2 as below will show such locks in case they exist:

–Query2:

SELECT w.session_id, w.blocking_session_id, w.wait_type, w.wait_duration_ms, w.resource_description, substring(qt.text,(r.statement_start_offset/2) +1, 
(case when r.statement_end_offset = -1  then len(convert(nvarchar(max), qt.text)) * 2 
else r.statement_end_offset end -r.statement_start_offset)/2)  as stmt_executing
FROM    sys.dm_os_waiting_tasks w, sys.dm_exec_requests r  cross apply sys.dm_exec_sql_text(sql_handle) as qt               
WHERE w.session_id > 49 and w.wait_type like ‘LCK%’ and w.session_id=r.session_id

The result of Query 2 could – in a more readable form since the row length is too long to present it nicely – look like:

Session_id:                         212        

Blocking Session_id:       639        

Wati_type:                         LCK_M_U           

Wait_time in ms:              1812      

Resource description:    keylock hobtid=72057594095730688 dbid=5 id=lock470e280 mode=U
                                                associatedObjectId=72057594095730688              

Statement:                         SELECT TOP 1 “MANDT” AS c ,”OBJNR” AS c ,”OBTYP” AS c ,”STSMA” AS c,”CHGKZ” AS c ,”CHGNR” AS c FROM “JSTO” WITH (UPDLOCK REPEATABLEREAD)  WHERE “MANDT” = @P000 AND “OBJNR” = @P001 /* R3:SAPLBSVU:954 T:JSTO */

The wait_type already is showing that we look at an Update lock which is requested by session 212. Session 639 seems to block this request. In the Resource description field you can get the fact that we are looking at a ‘keylock’ which is a row lock in the clustered index and in a most of the cases a row on the data level. You can also get the database ID and the object ID out of the resource description. The last column then gives you the statement which is getting blocked in execution at this point in time.

Query2 is nice in order to look at blocking locks situations which are rather simple, but for more complex situations and in order to work with SAP support, you should use a little set of queries and T-SQL procedures collected under the term ‘hangman’. Please check OSS notes #948633. Hangman gives even more data to analyze the situation, especially with intense lock chains, which can develop easily if longer locks are taken on critical resources as row in the number range table NRIV. Also SAP support uses the ‘hangman’ outputs frequently for investigating issues and hence is familiar with its output.

Wait Statistics

A lot of different resources where wait situations can happen in SQL Server are monitored. DMV  sys.dm_os_wait_stats shows where all those waits are accumulated. A good query to look at this data as a whole would be the following:

 

–Query 3:

select wait_type, wait_time_ms/waiting_tasks_count as ‘Avg Wait in ms’,

wait_time_ms/ signal_wait_time_ms ‘Wait to Signal Wait Ratio’,

waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms

from sys.dm_os_wait_stats where waiting_tasks_count > 0 and signal_wait_time_ms >0

 

The list can be rather confusing and a bit overwhelming. An interesting wait event to look at would be WRITELOG. This event indicates the write performance to the transaction log of the database(s). The counter on this event is incremented when the transaction log buffer in SQL Server gets flushed until an acknowledge of this synchronous I/O comes back from the I/O subsystem. The wait_types PAGEIOLATCH_SH and PAGEIOLATCH_EX give indications on the performance of read I/O from the data files. Be aware that in SQL Server 2005 some issues around accumulating these numbers got improved. Therefore the numbers show up higher than usually experienced in SQL Server 2000. But the numbers now pretty much match what is measured with Windows Performance monitor.

Nevertheless in cases where you want to analyze the next 30 or 60min workload you would need to run the query several times and build deltas since the numbers are accumulated since the start of SQL Server. Or you might get the idea to reset the values as meanwhile described in Books Online. This is a bad idea since it destroys the efforts SAP put into their database monitor to give you exactly the ability to investigate certain time ranges or even online observe own defined time frames. SAP DBA Cockpit offers you to look at all the wait types listed. It offers to reset them all to 0 when you desire and then displays the counts and values since you pressed ‘Reset’ it. The difference is that SAP DBA Cockpit will not reset the values on the database, but will simply start to perform delta reporting from the time of your reset on in DBA Cockpit. Therefore strictly avoid resetting those values on the database with the command that I purposely don’t name here, but use DBA Cockpit in order to monitor these wait statistics over a specific time frame.

 

Another approach to analyze the performance of a system or investigate improvement potentials is to look where SQL Server is waiting on resources and then try to get those resource bottlenecks resolved. Since we meanwhile have 200+ wait_types one certainly can get overwhelmed by getting clear indications what area the bottleneck is in. Tom already indicated some categories in his whitepaper. SQL Server Performance Dashboard which is a free download also materialized a categorization of wait_types. The query batch below will do so as well. Not very sophisticated using some temporary tables, but it works in principle. It basically reads the values of the 200+ wait_types, groups it in categories and displays the average wait time and the percentage of wait time. Any wait types that indicate some sleeping activity due to low usage of the server are not being calculated.

 

–Query4

create table ##aggr_temp (wait_type nvarchar(60), waiting_tasks_count bigint, wait_time_ms bigint, max_wait_time_ms bigint, signal_wait_time_ms bigint,

category nvarchar(60))

set nocount on

declare @category nvarchar(60), @wait_type nvarchar(60), @waiting_tasks_count bigint, @wait_time_ms bigint, @max_wait_time_ms bigint, @signal_wait_time_ms bigint

declare c1 cursor for select * from sys.dm_os_wait_stats where waiting_tasks_count > 0

open c1

fetch c1 into @wait_type, @waiting_tasks_count, @wait_time_ms, @max_wait_time_ms, @signal_wait_time_ms

while @@fetch_status <>-1

begin

                select @category =

                                case

                                                when @wait_type like N’LCK_%’ then N’Locking’

                                                when @wait_type like N’SOS%’ or @wait_type in (N’THREADPOOL’, N’LOWFAIL_MEMMGR_QUEUE’) then N’SOS’

                                                when @wait_type like N’LATCH_%’ then N’Latching’

                                                when @wait_type in (N’SLEEP_BPOOL_FLUSH’) then N’Checkpoint’

                                                when @wait_type like N’BROKER_%’ then N’Service Broker or Database Mirroring Foundations’

                                                when @wait_type like N’BACKUP%’ or @wait_type in (N’ASYNC_DISKPOOL_LOCK’, N’REQUEST_DISPENSER_PAUSE’) then N’Backup or Restore’

                                                when @wait_type like N’PAGELATCH_%’ then N’Buffer Latch’

                                                when @wait_type like N’PAGEIOLATCH_%’ or @wait_type in (N’IO_COMPLETION’, ‘ASYNC_IO_COMPLETION’, N’CHKPT’, N’DISKIO_SUSPEND’, N’IMPPROV_IOWAIT’) then N’I/O Performance’

                                                when @wait_type like N’QUERY%’ or @wait_type like N’RESOURCE_%’ or @wait_type like N’SQLSORT_%’ or @wait_type like N’DEADLOCK_%’

                                                or @wait_type in (N’RESOURCE_SEMAPHORE’, N’RESOURCE_SEMAPHORE_MUTEX’, N’RESOURCE_SEMAPHORE_QUERY_COMPILE’, N’RESOURCE_SEMAPHORE_SMALL_QUERY’,

                        N’CMEMTHREAD’, N’SOS_RESERVEDMEMBLOCKLIST’, N’BAD_PAGE_PROCESS’, N’CURSOR’, N’CURSOR_ASNC’, N’EE_PMOLOCK’,

                                                N’EE_SPECPROC_MAP_INIT’, N’QRY_MEM_GRANT_INFO_MUTEX’, N’OLEDB’, N’SQLSORT_NORMMUTEX’, N’SQLSORT_SORTMUTEX’, N’VIEW_DEFINITION_MUTEX’) then N’Query Execution’

                                                when @wait_type in (N’CXPACKET’, N’EXCHANGE’, N’EXECSYNC’) then ‘Parallel Query Execution’

                                                when @wait_type = N’SOS_SCHEDULER_YIELD’ then N’Scheduler Yield’

                                                when @wait_type in (N’LOGMGR’, N’LOGBUFFER’, N’LOGMGR_RESERVE_APPEND’, N’LOGMGR_FLUSH’, N’WRITELOG’) then N’Logging’

                                                when @wait_type in (N’ASYNC_NETWORK_IO’, N’NET_WAITFOR_PACKET’) or @wait_type like N’SNI_%’ then N’Network IO’

                                                when @wait_type like N’CLR_%’ or @wait_type like N’SQLCLR%’ then N’CLR’

                                                when @wait_type like N’DTC_%’ then N’Distributed Transaction Controller’

                                                when @wait_type like N’DBMIRROR%’ or @wait_type = N’MIRROR_SEND_MESSAGE’ then N’Mirroring’

                                                when @wait_type like N’XACT%’ or @wait_type like N’TRAN_MARKLATCH_%’ or @wait_type like N’MSQL_XACT_%’ or @wait_type = N’TRANSACTION_MUTEX’ then N’Transaction’

                                                when @wait_type like N’SLEEP_%’ or @wait_type in(N’LAZYWRITER_SLEEP’, N’WAITFOR’, N’WAIT_FOR_RESULTS’ ) then N’Sleep’

                                                when @wait_type in (N’FCB_REPLICA_READ’, N’FCB_REPLICA_WRITE’, N’REPLICA_WRITES’) then N’Database Snapshot’

                                                when @wait_type like N’FT%’ or @wait_type in (N’MSSEARCH’) then N’Fulltext Search’