Find Top N costly query plans in adhoc batches or modules...

I encountered a problem about a query not using a particular indexed view in SQL Server 2005. To investigate this issue, I figured that I would go about writing a query using the execution related dynamic dynamic management views to get the cost of the query that should have used the indexed view. This proved to be an interesting adventure that also revealed some inconsistencies in how we expose certain metadata in the DMVs. On the bright side, the final query showcases the powerful new relational, TSQL and XQuery features of SQL Server 2005.

 

Let's start with brief outline of the DMVs that I planned to use:

  • sys.dm_exec_query_stats - Returns performance statistics for cached query plans. This contains one row per query plan so if a stored procedure or batch contains two SELECT statements you may get two rows here
  • sys.dm_exec_sql_text - Returns the text of the sql statement based on the SQL handle
  • sys.dm_exec_query_plan - Returns the showplan in XML format for a batch or module based on the plan handle

A simple query to get the elapsed time counters and SQL text of the batch/module is shown below. This query uses the new APPLY operator to join with a table-valued function passing the sql_handle value for each row.

 

select qs.sql_handle, qs.execution_count

     , qs.total_elapsed_time, qs.last_elapsed_time

     , qs.min_elapsed_time, qs.max_elapsed_time

     , qs.total_clr_time, qs.last_clr_time

     , qs.min_clr_time, qs.max_clr_time
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st

 

This query can be modified to return the actual statement for which the stats are returned using the offset columns in sys.dm_exec_query_stats. Please note that the statement_start_offset and statement_end_offset columns are zero based, represents number of bytes and the end offset is -1 for last statement in the batch. In addition the text value returned by sys.dm_exec_sql_text is nvarchar(max) instead of text as documented incorrectly in Books Online. The modified query that accounts for these quirks is shown below:

 

select qs.sql_handle, qs.execution_count

, qs.total_elapsed_time, qs.last_elapsed_time

, qs.min_elapsed_time, qs.max_elapsed_time

, qs.total_clr_time, qs.last_clr_time

, qs.min_clr_time, qs.max_clr_time
, substring(st.text, (qs.statement_start_offset/2)+1

                        , ((case qs.statement_end_offset
when -1 then datalength(st.text)
else qs.statement_end_offset
end - qs.statement_start_offset)/2) + 1) as statement_text
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st

 

Next, a similar query that gets the elapsed time counters and the XML showplan output for the batch/module.

 

select qs.sql_handle, qs.execution_count

, qs.total_elapsed_time, qs.last_elapsed_time

, qs.min_elapsed_time, qs.max_elapsed_time

, qs.total_clr_time, qs.last_clr_time

, qs.min_clr_time, qs.max_clr_time
, qp.query_plan
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp

 

Note that as I mentioned above, sys.dm_exec_query_stats returns a row for each query plan in a batch/module. The showplan output on the other hand contains the entire plan in XML format with each statement identified by the StatementID attribute which is simply a counter starting from the first statement in a batch. For example, if a stored procedure contains two queries then there will be two entries in the XML plan - one with StatementID = 1 and another with StatementID = 2. If there are additional variable assignments statements or DDLs or other TSQL statements they are also numbered appropriately in the order of appearance.

 

Consider the example stored procedure below:

 

create procedure TestProc
as
begin
declare @spid int
set @spid = @@spid
select count(*) from master.sys.sysprocesses where spid = @spid
select count(*) from master.sys.syslockinfo where req_spid = @spid
end

 

The showplan output will contain 3 StmtSimple nodes (fragments shown below with StatementText, StatementType and StatementId attributes only):

 

  1. StatementText="create procedure testproc
as
begin
    declare @spid int
    set @spid = @@spid
   "
    StatementId="1"
    StatementType="ASSIGN"
  2. StatementText=" select count(*) from master.sys.sysprocesses where spid = @spid
   "
    StatementId="2"
    StatementType="SELECT"
  3. StatementText=" select count(*) from master.sys.syslockinfo where req_spid = @spid
"
    StatementId="3"
    StatementType="SELECT"

 

The sys.dm_exec_query_stats will contain two rows one for each query only. So when we join sys.dm_exec_query_stats with sys.dm_exec_query_plan you will get the same showplan for both rows and we need to aggregate information from the plan or identify the appropriate StmtSimple nodes to get the cost of each query. So I generate first artificial counters by sorting and counting the statment_start_offset value from sys.dm_exec_query_stats. The same is done for the StmtSimple nodes in the query plan by selecting those of type "SELECT" and sorting them by StatementId value. Now this relative sequence numbers can be used to join and get the costing attributes for each query.

 

The final query that contains the logic to match the query stats with the plan nodes is shown below:

 

declare @top_n_plans int;
set @top_n_plans = 25;
with XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/showplan' as sql)
select top(@top_n_plans)
qst.text as sql_text, qp.query_plan
, qst.statement_id
, qst.statement_text as select_statement
, qps2.statement_optimization_level
, qps2.statement_optimization_early_abort_reason
, (
select sum(ro.SubTreeCost.value(<N'@EstimatedTotalSubtreeCost'>, 'float'))
from qp.query_plan.nodes(

               N'//sql:Batch/sql:Statements/sql:StmtSimple/sql:QueryPlan[1]/sql:RelOp[1]'

             ) as ro(SubTreeCost)
) as Totalcost
, qps2.statement_sub_tree_cost
, qst.creation_time, qst.last_execution_time, qst.execution_count
, qst.total_elapsed_time, qst.last_elapsed_time, qst.min_elapsed_time, qst.max_elapsed_time
, qst.total_worker_time, qst.last_worker_time, qst.min_worker_time, qst.max_worker_time
, qst.total_physical_reads, qst.last_physical_reads

     , qst.min_physical_reads, qst.max_physical_reads
, qst.total_logical_writes, qst.last_logical_writes

     , qst.min_logical_writes, qst.max_logical_writes
, qst.total_logical_reads, qst.last_logical_reads

     , qst.min_logical_reads, qst.max_logical_reads

     , qst.total_clr_time, qst.last_clr_time, qst.min_clr_time, qst.max_clr_time
, qst.sql_handle, qst.plan_handle
from (
select *
, substring(st.text, (qs.statement_start_offset/2)+1

                                , ((case qs.statement_end_offset
when -1 then datalength(st.text)
else qs.statement_end_offset
end - qs.statement_start_offset)/2) + 1) as statement_text
, ROW_NUMBER() OVER(PARTITION BY qs.plan_handle

                                     ORDER BY qs.statement_start_offset) as statement_id
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
) as qst
cross apply sys.dm_exec_query_plan (qst.plan_handle) as qp
cross apply (
-- Since sys.dm_exec_query_stats doesn't have statement id,

        -- we just sort the actual statement id from showplan for

        -- SELECT statements and join them with similar sequence number generated based
-- on the statement start offset in sys.dm_exec_query_stats.
-- This allows us to match the row from showplan with that of the query stats.
-- This is a problem for batches containing multiple SELECT statements

        -- and hence this solution.
select ROW_NUMBER() OVER(ORDER BY qps1.statement_id) as rel_statement_id
, qps1.statement_optimization_level, qps1.statement_sub_tree_cost

             , qps1.statement_optimization_early_abort_reason
from (
select sel.StmtSimple.value(<'@StatementId'>, 'int')
, sel.StmtSimple.value(<'@StatementSubTreeCost'>, 'float')
, sel.StmtSimple.value(<'@StatementOptmLevel'> , 'varchar(30)')
, sel.StmtSimple.value(<'@StatementOptmEarlyAbortReason'>, 'varchar(30)')
from qp.query_plan.nodes(

                       N'//sql:Batch/sql:Statements/sql:StmtSimple[@StatementType = "SELECT"]'

                     ) as sel(StmtSimple)
) as qps1(statement_id, statement_sub_tree_cost

                     , statement_optimization_level, statement_optimization_early_abort_reason)
) as qps2
where qps2.rel_statement_id = qst.statement_id
-- and qst.text like ... /* can be used to filter only particular statemetns */
order by Totalcost desc, qst.plan_handle, qst.statement_id;

 

Lastly, let me highlight the new relational, TSQL and Xquery features of SQL Server 2005 as used in the query.

 

  • TOP clause - Support for variables or expressions in the TOP clause
  • APPLY operator - Used to join table against table-valued function in the query
  • ROW_NUMBER analytic function - To generate sequence numbers based on a particular order and optionally partition the rows into different groups
  • Xquery nodes method - To get the matching nodes based on a XPath expression for each instance of a query plan
  • Xquery value method - To get a particular Xml attribute as a typed column
  • WITH XMLNAMESPACES - To use namespace prefix in XPath expressions easily in a query

 

So run the query and enjoy the output / richness of metadata that SQL Server 2005 exposes. The Books Online topics and the MSDN links of the various features used in the query is below for convenience sake:

 

--

 

Umachandar Jayachandran