Store Statistics XML in database tables using SQL Traces for further analysis.


Since SQL Server 2005, query plan as well as statistics of query execution can be captured  in XML format. Also, SQL Server 2005 has XQuery support to directly query XML document. By combining these two new features, users can analyze the query plans using queries.


 


However, in SQL Server, there is no easy way to capture the statitics XML into a table. Fortunately, there are SQL traces provided by SQL Server to capture the showplan XML and statistics XML information into trace files and loaded into tables.


 


Note: The only limitation is the 128 level of nesting levels supported by XML data type in SQL 2005. In that case, you have to write client code to parse the query plan, which going to be a very complex query plan.


 


Here is a small example using SQL traces to store the statistics XML and extract the estimated rows and actual rows


 


/*Using Traces to Capture Statistics XML*/


declare @trace_id int


declare @trace_file nvarchar(200)


select @trace_file = ‘c:\temp\test_stats_’ + cast(newid() as varchar(100))


 


— using trace table.


exec sp_trace_create @trace_id output,


      2,


      @tracefile=@trace_file


 


— capture statistics-xml, textdata, on


exec sp_trace_setevent @trace_id, 146, 1, 1


 


— start


exec sp_trace_setstatus @trace_id, 1


 


— test statement.


select * from sys.objects


 


— stop


exec sp_trace_setstatus @trace_id, 0


 


— close


exec sp_trace_setstatus @trace_id, 2


 


— load trace files into table


if object_id(‘temp_trc’) is not null


      drop table temp_trc


 


select *


into temp_trc


from fn_trace_gettable(@trace_file + ‘.trc’, default)


 


— look at the captured stats xml


declare @plan xml


 


select @plan=cast(textdata as xml)


from temp_trc


where eventclass = 146;


 


— collect the actual and also estimate stats.


with XMLNAMESPACES (‘http://schemas.microsoft.com/sqlserver/2004/07/showplan’ as sql)


select


      ro.relop.value(‘@NodeId’, ‘int’) NodeId,


      ro.relop.value(‘@PhysicalOp’, ‘nvarchar(200)’) PhysicalOp,


      ro.relop.value(‘@LogicalOp’, ‘nvarchar(200)’) LogicalOp,


      (ro.relop.value(‘@EstimateRows’, ‘float’)


            * (ro.relop.value(‘@EstimateRewinds’, ‘float’)


                  +  ro.relop.value(‘@EstimateRebinds’, ‘float’)


                  + 1.0)) EstimateRows,


      case


            when root_actual.ActualRows = 0


                  then null


            else root_actual.ActualRows


      end ActualRows,


      cast(ro.relop.exist(‘*/sql:RelOp’) as bit) IsNotLeaf


from @plan.nodes(‘//sql:RelOp’) as ro(relop)


      cross apply (


            select sum(rti.info.value(‘@ActualRows’, ‘float’)) ActualRows


            from ro.relop.nodes(‘sql:RunTimeInformation/sql:RunTimeCountersPerThread’) as rti(info)


      ) root_actual;


go


 


The output of the estimate rows and actual rows is given below:


 





























































NodeId


PhysicalOp


LogicalOp


EstimateRows


ActualRows


IsNotLeaf


0


Nested Loops


Left Outer Join


52


52


1


1


Nested Loops


Left Outer Join


52


52


1


2


Filter


Filter


52


52


1


3


Compute Scalar


Compute Scalar


52


NULL


1


4


Clustered Index Scan


Clustered Index Scan


52


52


0


13


Clustered Index Seek


Clustered Index Seek


52.000031


NULL


0


14


Clustered Index Seek


Clustered Index Seek


52


52


0

Comments (2)

  1. Fitzer says:

    Hi  Guys

    Anybody able to elaborate on how you view the histogram for multi column statistics.

    I have a query plan that has estimated Rows (2943) and Actual Rows (3099750). Out by a factor 1000.

  2. jbguidewire says:

    Is there any way to get this information via SQL from a JDBC application?  We already make extensive use of the DMVs and do some parsing of the query plan xml. It would really help us to be able to get the actual execution time statistics, even if it came at a price.

    When running against Oracle, we have a configurable option for whether or not to run with STATISTICS_LEVEL = ALL.  We run with STATISICS_LEVEL = ALL within our performance test harness so that we can capture the detailed row source statistics for the execution plans.  (At the end of our run, we query the V$ tables to get the information.)  That information has been invaluable for us.