I was approached recently with the following request: “Hey Joseph, we think that this query is performing horribly. Here is the XML query plan. Can you tell us what is going on?”
My first reaction was to start looking at the query plan and see if I can find an “obvious” problem. But I know from experience that this is the wrong approach. A query plan is rarely the first place to start when troubleshooting performance issues. Thus I wanted to go back to basics and find out how slow this query actually was. For that I needed to look at SYS.DM_EXEC_QUERY_STATS. But after a quick look there I found hundreds of entries that contained similar or identical query text and my goal of narrowing down to one query looked unachievable.
Then, this idea occurred to me (and thus this blog post). Each XML plan stores the Query Plan Hash and Query Hash values. You can find those as the last attributes of the <StmtSimple > element. Here is an example
<StmtSimple … QueryHash="0x2C1B48BA086D059C" QueryPlanHash="0x195629D01FDDC68A">
If you want to learn more about Query Hash and Query Plan Hash, look at Bart’s excellent blog on the subject.
Now, with these two identifiers in place I was able to narrow down the query by looking at sys.dm_exec_query_stats DMV like this.
select * from sys.dm_exec_query_stats cross apply sys.dm_exec_sql_text (sql_handle)
where query_hash = 0x2C1B48BA086D059C and query_plan_hash = 0x195629D01FDDC68A
Now, I could see max_worker_time (max CPU on longest execution), max_elapsed_time (Duration), max_logical_reads, etc. Looking at the Duration and CPU I was able to determine that the query was running long due to waits: duration was much longer than CPU (see Duration and CPU Reveal a Profound Truth on this subject). Then I discovered the main wait type/bottleneck (see Waits, Waits Do Tell Me blog) and went on to eliminate that bottleneck and successfully resolve the problem.
The Query plan is rarely the right place to start and in fact in this case led us nowhere.