SQL Server Data Collector: Nightly purge can leave orphaned rows


The Bug Description: The MDW database has a nightly purge job that removes old data unfortunately two tables are not purged by that job: notable_query_plans and notable_query_text.  This leaves orphaned rows in a Data Collector MDW.


Are you Impacted? You will know you are impacted by this problem if you have run the built-in MDW collection sets for several months, and find that the database keeps growing and that one table in particular — [snapshots].[notable_query_plans] is using most of the space.


The Long Term Fix: This bug will be fixed in CU5 for SQL 2008


The Work Around:  If you’re affected by this process you can use the script below as a short-term workaround.  This will purge the notable_query_plans and notable_query_text tables by looking for plans that are no longer referenced by any of the rows in the [snapshots].[query_stats] table.  It may take a very long time to run for the first execution (from a few hours up to a day), depending on the number of orphaned rows that need to be cleaned up.  After the first run has caught up on all of the deferred cleanup work, subsequent executions shouldn’t take as long.


    — Purge snapshots.notable_query_plan table


DECLARE @delete_batch_size bigint;


    DECLARE @rows_affected int;


    SET @delete_batch_size = 500;


    SET @rows_affected = @delete_batch_size;


    WHILE (@rows_affected = @delete_batch_size)


    BEGIN


        DELETE TOP (@delete_batch_size) snapshots.notable_query_plan


        FROM snapshots.notable_query_plan AS qp


        WHERE NOT EXISTS (


            SELECT *


            FROM snapshots.query_stats AS qs


            WHERE qs.[sql_handle] = qp.[sql_handle] AND qs.plan_handle = qp.plan_handle


                AND qs.plan_generation_num = qp.plan_generation_num


                AND qs.statement_start_offset = qp.statement_start_offset


                AND qs.statement_end_offset = qp.statement_end_offset


                AND qs.creation_time = qp.creation_time);


        SET @rows_affected = @@ROWCOUNT;


        RAISERROR (‘Deleted %d orphaned rows from snapshots.notable_query_plan’, 0, 1, @rows_affected) WITH NOWAIT;


    END;


    — Purge snapshots.notable_query_text table


    SET @rows_affected = @delete_batch_size;


    WHILE (@rows_affected = @delete_batch_size)


    BEGIN


        DELETE TOP (@delete_batch_size) snapshots.notable_query_text


        FROM snapshots.notable_query_text AS qt


        WHERE NOT EXISTS (


            SELECT *


            FROM snapshots.query_stats AS qs


            WHERE qs.[sql_handle] = qt.[sql_handle]);


        SET @rows_affected = @@ROWCOUNT;


        RAISERROR (‘Deleted %d orphaned rows from snapshots.notable_query_text’, 0, 1, @rows_affected) WITH NOWAIT;


    END;

Comments (3)

  1. TonySag says:

    I implemented this work around and is definitely working for me, except for the fact that it was running too slow. It was taking more than one hour every day

    After looking at the execution plan I found that an index was missing:

    USE [MDW]

    GO

    CREATE NONCLUSTERED INDEX [IDX_query_stats_handle]

    ON [snapshots].[query_stats] ([sql_handle])

    GO

  2. Patrick Buskey says:

    I actually found that creating this index worked best for both of the delete orphaned records queries:

    CREATE NONCLUSTERED INDEX [idx_query_stats_sqlhandle2] ON [snapshots].[query_stats]

    ([sql_handle] ASC, [plan_handle] ASC, [plan_generation_num] ASC, [statement_start_offset] ASC, [statement_end_offset] ASC, [creation_time] ASC)

    WITH (FILLFACTOR = 75, PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]