Correlating XE query_hash and query_plan_hash to sys.dm_exec_query_stats in order to retrieve Execution plans for high Resource statements.

Extended events is a powerful feature that allows us to troubleshoot performance issues within SQL Server. One of the ways you can utilize extended events to aggregate statements is described in a blog post below by Bob Dorr SQL Server 2012: RML, XEvent Viewer and Distributed Replay

Let’s apply that methodology to a contrived scenario, and my top statement in this case is a statement that consumes a bunch of CPU and I need to tune it.  Note that the aggregation is done on query_hash. Bart has an excellent post on Query Fingerprints and Plan Fingerprints ( query_hash and query_plan_hash) and would recommend going through that post if you aren’t already familiar with the concepts as that is key to understanding the remainder of this post.

As we can see from the below aggregation, this statement is executed 107 times, but if I look at individual executions (note the highlighted ones), some executions take a “small” number of reads and negligible CPU. Some others take significantly more CPU and number of reads.

 

clip_image002

 

Being curious as to why this was the case, I wanted to further aggregate on the query_plan_hash to validate whether this statement had more than one distinct plan. So I changed the grouping to include the query_plan_hash. 

clip_image004

 

clip_image006

 

Once I did that and aggregate by CPU descending, I noticed that the same query_hash  had 2 different query_plan_hash values indicating this statement had 2 different plans at different times

clip_image008

 

Given this data, you can reason that when a particular plan is chosen, resource consumption is more than when the alternative is chosen. So we in a way have a “good plan” and “bad Plan” type of situation that could be due to parameter sniffing or a number of other reasons. Once again a good read is Query Fingerprints and Plan Fingerprints ( query_hash and query_plan_hash)

Now this brings more questions into play:

· Why do we have 2 plans?

· What was the difference between the 2 plans?

· Do I have to collect another set of data with plans in order to get that information?

Note that when collecting Extended Events, there are events such as Query Plans that are expensive to collect. Normally, in the lifecycle of an Extended Event, if an event is enabled, we collect the Columns (attributes) that are a part of the Extended Event itself, then apply any filters, and then execute any Actions. In the case of query plans, there are some performance implications of collecting them in a production environment.

See: https://www.sqlperformance.com/2013/03/sql-plan/showplan-impact

Given that information, my thought was to extract the estimated plans from  dm_exec_query_stats DMV. If your XE File has the sql_handle or the plan_handle captured for the statement level events, then it is pretty straightforward if you get the plan_handle from the Extended Event.

select c.query_plan,b.text,a.* from sys.dm_exec_query_stats a

cross apply sys.dm_exec_sql_text(sql_handle) b

cross apply sys.dm_exec_query_plan(plan_handle) c

where plan_handle = 0x0500120076A2250490C2D9780400000001000000000000000000000000000000000000000000000000000000

 

clip_image010

However the question is how do we know if this was the “good” or the “bad” plan given there are 2 plans?  The plan_handle won’t distinguish which of the 2 alternatives this was.  The key lies in the query_plan_hash. If you look at the Properties of the plan, you will see the QueryHash and the QueryPlanHash as below.

 

clip_image012

 

Looking at the query_plan_hash from the Extended Event session you can decipher which is the more expensive plan in general.

Good Query plan Hash: 17443710949243051033

Bad Query Plan Hash: 16026647133934493185

Unfortunately “currently” you cannot always directly correlate to the XEvent query_plan_hash  to the Execution Plan or sys.dm_exec_query_stats query_plan_hash given the data types are different. The query_hash and query_plan_hash in XEvent is a UINT64, and it is binary in sys.dm_exec_query_stats. Given that, you have to do some bit manipulation to turn off the most significant bit in order to compare the 2 since there isn’t a UINT64 data type in SQL Server. If the MSB (most significant bit) is not set, then you can convert easily. If it is set, though, you have to do bit manipulation to turn it off. Internally I have filed a bug filed on the same to ensure the data types match so as to not have to use the workaround below but if this a must have please file a connect item.

Before we dive into the TSQL script, I will try to spend a couple lines to explain that to folks who haven’t worked a lot with binary. For this example let us consider a Smallint data type in SQL Server. The possible range of values for smallint (2 bytes) is -32,768 to 32,767

Converting 2 bytes below to Binary, you notice in the second row, if the same 2 bytes are signed, the most significant bit (highlighted) is used to indicate if it is a positive or negative number. If it is set, it is negative, if not, it is positive.

                                                                                         

Binary ( 2* 8bits)

Hex ( F = 1111 bits)

Decimal ( Max)

Unsigned

1111 1111 1111 1111

FFFF

65535

Signed

1111 1111 1111 1111

FFFF

-32768

Signed

0111 1111 1111 1111

7FFF

32767

 

Given this table, you can see that to compare an unsigned to a signed value, we have to manipulate and turn off that most significant bit so that the values can be compared.

Now looking back at our XE values, the problem here is that XE query_hash is an unsigned INT 64 but we do not have a matching data type in SQL Server to convert the DMV query_hash into and hence the script below.

XEvent Query_plan_hash from XEvent: 17443710949243051033

Execution Plan QueryPlanHash: 0xF21482B006395C19

a. Single Query Hash comparison

--- For a single Query_hash comparison

declare @xeHash decimal(20,0)

declare @DMVNoMSBBitset decimal(20,0), @XENoMSBBitSet decimal(20,0)

declare @a as bigint

select @a = 0x7fffffffffffffff

-- Replace the value with the value of the XE query_plan_hash

-- (in this case it was 17443710949243051033)

-- Turn off most significant bit from sys.dm_exec_query_stats

-- oery_plan_hash by AND'ing with 0x7fffffffffffffff

-- This is the XE hash with no MSB set

set @xeHash = cast(17443710949243051033 as decimal(20,0))

-- Replace the value XE query_plan_hash (in this case it was 17443710949243051033)

select @DMVNoMSBBitset= cast((0xF21482B006395C19 & @a) as decimal(20,0))

-- This is the DMV or Query Plan query_hash with no MSB set.

-- This is basically subtracting the MSB --0x80000000`00000000 from DMV value

-- The value 9223372036854775808 contains--> 0x80000000`00000000 which is just the MSB

select @XENoMSBBitSet=cast( (@xeHash - cast(9223372036854775808 as decimal(20,0))) as decimal(20,0))

select @DMVNoMSBBitset as DMVHashNoMSB, @XENoMSBBitSet as XEHashNoMSB

if @DMVNoMSBBitset = @XENoMSBBitSet

select 'Hash is equal'

else

select 'Hash is not equal'

 

b. Query Hash comparison extended to DMV

We can modify this to directly query query sys.dm_exec_query_stats and see if this particular query_hash and /or query_plan_hash is currently in cache.

-- Turn off most significant bit by AND'ing with 0x7fffffffffffffff

declare @xeHash decimal(20,0)

-- Replace the value with the value of the XE query_plan_hash

set @xehash = cast(17443710949243051033 as decimal(20,0))

declare @a as bigint

declare @NoMSBBitset varbinary

select @a = 0x7fffffffffffffff

select @NoMSBBitset= cast((0xF21482B006395C19 & @a) as bigint)

 

-- Compare hash after turning off MSB on both places

-- The line 9223372036854775808 = 0x80000000`00000000 which subtracts MSB.

select b.text,c.query_plan,a.* from sys.dm_exec_query_stats a

cross apply sys.dm_exec_sql_text(a.sql_handle) b

cross apply sys.dm_exec_query_plan(a.plan_handle) c

where cast((cast(cast(query_plan_hash as varbinary) as bigint) & @a) as bigint) = cast( (@xeHash - cast(9223372036854775808 as decimal(20,0))) as decimal(20,0))

 

Note that while we are attempting to pull the “good” or “bad” plan from the cache, keep in mind that it won’t always be in cache. We can use ways to poll for its existence and dump it out if necessary, though as the hash is the signature and won’t change. We have already seen the “good” plan which was a SCAN, now we need to attempt to retrieve the “bad plan” when it comes into cache, but we are definitely aided by the fact that we know the query_hash and the query_plan_hash.

Pulling the Bad plan for the same query using the bad query Hash (when it exists in cache). You can further tweak this to poll and see whenever that query_hash has a query_plan_hash that is different than the known “good” query_plan_hash.

-- Turn off most significant bit by AND'ing with 0x7fffffffffffffff

-- 0x7fffffffffffffff represents the MSB turned off in max value for UNIT 64

declare @xeHash decimal(20,0)

-- Replace the value with the value of the XE query_plan_hash

set @xehash = cast(16026647133934493185 as decimal(20,0))

declare @a as bigint

declare @NoMSBBitset varbinary

select @a = 0x7fffffffffffffff

select @NoMSBBitset= cast((0xF21482B006395C19 & @a) as bigint)

-- Compare hash after turning off MSB on both places

-- The line 9223372036854775808 = 0x80000000`00000000 which subtracts MSB.

select b.text,c.query_plan,a.* from sys.dm_exec_query_stats a

cross apply sys.dm_exec_sql_text(a.sql_handle) b

cross apply sys.dm_exec_query_plan(a.plan_handle) c

where cast((cast(cast(query_plan_hash as varbinary) as bigint) & @a) as bigint) = cast( (@xeHash - cast(9223372036854775808 as decimal(20,0))) as decimal(20,0))

 

As you can see below, there is an estimation problem, estimated 56 rows but returned 49,000 resulting in a potentially suboptimal plan being chosen. The reason that was the case is a data skew in the base table. If you look at the “Compile” parameter in this case, the @country was compiled with the parameter ‘AU’ . In the prior snapshot of the plan earlier in the article which had the SCAN, the compile parameter was ‘US’

 

clip_image013

 

clip_image014

 

The intent of this post is NOT to go through explaining parameter sniffing as a whole, the focus in this post was helping to use Extended Events effectively and to be able to retrieve plans from the cache correlating the XE data so as to avoid the overhead of having to collect the Plan related extended events.

Denzil Ribeiro – Sr. Premier Field Engineer ( @DenzilRibeiro )