Tracking Down Missing Indexes in SQL Azure

One of the challenges of SQL Azure is that not all of the TSQL that you are used to using is supported yet. Since the underlying engine is plain ole’ SQL Server, the engine can understand the TSQL, but we just block its use because we haven’t yet made it work in the multi-tenant, multi-server environment that is SQL Azure. 

One of the classic missing index scripts can be seen in Bart Duncan’s classic post. For simplicity, I have reposted the TSQL below:

    1:  SELECT 
    2:    migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure, 
    3:    'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) 
    4:    + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
    5:    + ' ON ' + mid.statement 
    6:    + ' (' + ISNULL (mid.equality_columns,'') 
    7:      + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END 
    8:      + ISNULL (mid.inequality_columns, '')
    9:    + ')' 
   10:    + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, 
   11:    migs.*, mid.database_id, mid.[object_id]
   12:  FROM sys.dm_db_missing_index_groups mig
   13:  INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
   14:  INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
   15:  WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
   16:  ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

Unfortunately, if you try to use this TSQL, you immediately run into the problem that none of the DMVs are supported in SQL Azure. So much for the easy way…

Since the DMVs are just ongoing collections of information that you can collect manually from dm_exec_query_stats, I decided to try to build this up manually. This led me to generate the following query:

 

    1:  SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="https://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
    2:  qp.query_plan.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
    3:  FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp 
    4:  WHERE qp.query_plan.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
    5:  ORDER BY TotalImpact DESC

This generates a list of ShowPlanXMLs ordered by “execution count * missing index group impact”. Now that we have a list of ShowPlans ordered by overall impact, we need to parse the ShowPlanXML to pull out the missing indexes. For those unfamiliar with the missing index information in ShowPlanXML data, here is an example:

<MissingIndexes><MissingIndexGroup Impact="98.6314"><MissingIndex Database="[BugCheck]" Schema="[dbo]" Table="[Watchlists]"><ColumnGroup Usage="EQUALITY"><Column Name="[ID]" ColumnId="1" /></ColumnGroup></MissingIndex></MissingIndexGroup></MissingIndexes>

As you can see, it contains all the information necessary to define the indexes the engine thinks are missing.

Now, for each ShowPlanXML row, we need to use XQuery to shred the MissingIndexes information into its key information. In a classic case of copying good work already done instead of spending time doing it myself, I found that the Performance Dashboard Reports already do this shredding in one of their reports, so I could copy it:

    1:      SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
    2:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
    3:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
    4:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE"    return string($col/@Name)')), '] [', '],[') as included_columns 
    5:      from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)

By combining the above two queries with a cursor, I can stick each shredded missing index into a temporary table. Then, I can use the equality, inequality, and included columns from the temporary table to generate CREATE INDEX statements as follows:

    1:  select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name + 
    2:  ' (' + IsNull(equality_columns,'') + 
    3:  CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' + 
    4:  CASE WHEN included_columns='' THEN
    5:  ';'
    6:  ELSE
    7:  ' Include (' + included_columns + ');'
    8:  END
    9:  from #results

DISCLAIMER: As with all automated INDEX suggestion scripts, you need take a look at the CREATE INDEX statements suggested and decide if they make sense for you before you run out and apply them to your production instance!!

One important thing to point out is that even though I was designing this script for SQL Azure, it works just fine against an on-premise instance of SQL Server. Since SQL Azure supports a subset of the overall SQL Server functionality, you will almost always find that a solution for SQL Azure works just fine against SQL Server. Lastly, this functionality has been added to the CSS SQL Azure Diagnostics Tool (CSAD) so that you don’t have to worry about running this manually if you don’t want to.

For completeness, here is the TSQL statement in its entirety:

 create table #results (target_object_name nvarchar(100), equality_columns nvarchar(100), inequality_columns nvarchar(100), included_columns nvarchar(100))
  
 declare @query_plan as xml
 declare @totalimpact as float
  
 declare querycursor CURSOR FAST_FORWARD FOR
 SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="https://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
 qp.query_plan.value('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
 FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp 
 WHERE qp.query_plan.exist('declare default element namespace "https://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
 ORDER BY TotalImpact DESC
  
 OPEN querycursor
 FETCH NEXT FROM querycursor
 INTO @query_plan, @totalimpact  --need to remove the namespace
  
 WHILE @@FETCH_STATUS=0
 BEGIN
  
     insert into #results (target_object_name, equality_columns, inequality_columns, included_columns)
     SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
     ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
     ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
     ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE"    return string($col/@Name)')), '] [', '],[') as included_columns 
     from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)
     
     FETCH NEXT FROM querycursor
     INTO @query_plan, @totalimpact
     
 END
  
 CLOSE querycursor
 DEALLOCATE querycursor
  
 select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name + 
 ' (' + IsNull(equality_columns,'') + 
 CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' + 
 CASE WHEN included_columns='' THEN
 ';'
 ELSE
 ' Include (' + included_columns + ');'
 END
 from #results
  
 drop table #results