Added per-operator level performance stats for Query Processing


When you are troubleshooting query performance, metrics are never too much. With that in mind, based on customer feedback, but also our own CSS support, we have added runtime information to Showplan, and exposed a new Extended Event in SQL Server 2016 RC0. We are adding the new Showplan info and Extended Event in SQL Server 2014 Service Pack 2, and as usual deliver added value to in-market versions.

Regarding Showplan, take a test query on which we are doing a Clustered index Scan, and look the information available until SQL Server 2014:

<RunTimeInformation>
   <RunTimeCountersPerThread Thread="0" ActualRows="8001" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>

We can see the scan returned 8001 rows, but that doesn’t give you much insight regarding performance of the scan.

But below we can observe what kind of detail we can get from runtime stats in SQL Server 2016 RC0. These include the actual rows read, I/O Reads, CPU time, and elapsed time, all exposed per-thread:

<RunTimeInformation>
   <RunTimeCountersPerThread Thread="0" ActualRows="8001" ActualRowsRead="10000000" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="965" ActualCPUms="965" ActualScans="1" ActualLogicalReads="26073" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>

And looking at the same Index Scan running in parallel:

<RunTimeInformation>
   <RunTimeCountersPerThread Thread="6" ActualRows="0" ActualRowsRead="932085" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="160" ActualCPUms="158" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="8" ActualRows="0" ActualRowsRead="886279" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="138" ActualCPUms="137" ActualScans="1" ActualLogicalReads="2341" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="10" ActualRows="0" ActualRowsRead="828520" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="121" ActualCPUms="120" ActualScans="1" ActualLogicalReads="2192" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="12" ActualRows="0" ActualRowsRead="932085" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="155" ActualCPUms="155" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="11" ActualRows="0" ActualRowsRead="828520" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="133" ActualCPUms="132" ActualScans="1" ActualLogicalReads="2192" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="9" ActualRows="0" ActualRowsRead="724955" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="124" ActualCPUms="124" ActualScans="1" ActualLogicalReads="1918" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="1" ActualRows="0" ActualRowsRead="724955" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="121" ActualCPUms="120" ActualScans="1" ActualLogicalReads="1918" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="7" ActualRows="0" ActualRowsRead="414260" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="111" ActualCPUms="109" ActualScans="1" ActualLogicalReads="1096" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="5" ActualRows="0" ActualRowsRead="1035650" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="168" ActualCPUms="165" ActualScans="1" ActualLogicalReads="2740" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="4" ActualRows="8001" ActualRowsRead="932086" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="160" ActualCPUms="158" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="2" ActualRows="0" ActualRowsRead="828520" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="156" ActualCPUms="130" ActualScans="1" ActualLogicalReads="2192" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="3" ActualRows="0" ActualRowsRead="932085" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="159" ActualCPUms="158" ActualScans="1" ActualLogicalReads="2466" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
   <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="2" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
</RunTimeInformation>

NOTE: ActualCPUms and ActualElapsedms will show cumulative values for the node and its children when executing in Row Mode, and will show just the single node values if executing in Batch Mode.

NOTE2: As of November 2016, SSMS now exposes these properties under Actual I/O Statistics in the scope of a specific operator.

We have also introduced a new Extended Event (query_thread_profile), allowing more insight on the performance of each node and thread:

QueryThreadProfileXE

A sample session using this event can be configured as in the below example:

CREATE EVENT SESSION [PerfStats_XE] ON SERVER
ADD EVENT sqlserver.query_thread_profile(
ACTION(sqlos.scheduler_id,sqlserver.database_id,sqlserver.is_system,sqlserver.plan_handle,sqlserver.query_hash_signed,sqlserver.query_plan_hash_signed,sqlserver.server_instance_name,sqlserver.session_id,sqlserver.session_nt_username,sqlserver.sql_text))
ADD TARGET package0.ring_buffer(SET max_memory=(25600))
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO

Note: Showplan time scale is shown in milliseconds, while the xEvent shows microseconds for CPU and total time.

Pedro Lopes (@sqlpto) – Senior Program Manager

Comments (2)

  1. Jim Finnerty says:

    A TSQL example showing how to enable this XEvent would be helpful.

  2. Fair point. Added a sample to the post.

Skip to main content