page contents

Statistical oddities when SQL returns more rows after archiving data


 

I was recently doing a performance review for a customer when I ran across something very interesting.  The customer is using IDMF to archive data so I know that occasionally the record counts in a few inventory tables goes down.  In reviewing data in DynamicsPerf, I was reviewing all the SQL queries with multiple plans:

SELECT TOP 100 PERCENT DATABASE_NAME,
       QUERY_HASH, (SELECT SUM(EXECUTION_COUNT) FROM QUERY_STATS_CURR_VW QV WHERE QV.QUERY_HASH = A.QUERY_HASH) AS EXEC_COUNT,
       (SELECT SQL_TEXT
        FROM   QUERY_TEXT QT
        WHERE  QT.QUERY_HASH = A.QUERY_HASH)      AS SQL_TEXT,
       COUNT(QUERY_PLAN_HASH)                     AS NO_OF_PLANS,
       (SELECT  MIN(AVG_TIME_ms)
        FROM   QUERY_STATS QS1
        WHERE  QS1.DATABASE_NAME = A.DATABASE_NAME
               AND QS1.QUERY_HASH = A.QUERY_HASH) AS MIN_AVG_TIME,
       (SELECT  max(AVG_TIME_ms)
        FROM   QUERY_STATS QS1
        WHERE  QS1.DATABASE_NAME = A.DATABASE_NAME
               AND QS1.QUERY_HASH = A.QUERY_HASH) AS MAX_AVG_TIME,
       STUFF ((SELECT DISTINCT ', '
                               + CONVERT(VARCHAR(64), QUERY_PLAN_HASH, 1)
                               + ' time(ms)= '
                               + CAST((SELECT cast (MAX(AVG_TIME_ms) as varchar(20) )+ ' , executions = ' + cast( sum(execution_count) as varchar(20) ) FROM QUERY_STATS QS2 WHERE QS2.DATABASE_NAME = QV1.DATABASE_NAME AND QS2.QUERY_HASH = QV1.QUERY_HASH AND QS2.QUERY_PLAN_HASH = QV1.QUERY_PLAN_HASH) AS VARCHAR(50))
               FROM   QUERY_STATS_VW QV1
               WHERE  QV1.QUERY_HASH = A.QUERY_HASH
                      AND QV1.DATABASE_NAME = A.DATABASE_NAME
               FOR xml path('')), 1, 1, '''')     AS QUERY_PLAN_HASH
FROM   (SELECT DISTINCT DATABASE_NAME,
                        QUERY_HASH,
                        QUERY_PLAN_HASH
        FROM   QUERY_STATS_VW QV) AS A

GROUP  BY DATABASE_NAME,
          QUERY_HASH
HAVING COUNT(QUERY_PLAN_HASH) > 1
ORDER  BY 6 DESC

 

A specific query plan for INVENTSUM was showing multiple plans.  The funny part is that the newer plan was estimating more rows being returned then the older plan.  This seems odd because well, they are archiving data so there are less rows in the table. 

The before plan:

image

The after plan:

image

So, how can SQL estimate more rows being returned after we removed data from the tables?

 

Believe it or not there is a pretty reasonable explanation.  The answer is Business Process.  It’s number one on the list of factors that impact performance. (Slide in the Dynperf Solution)  There are many necessary changes to how data is processed in Dynamics AX for business needs.  In this example,  an old company’s records had been removed which had a lot of data.  Once that data was removed then SQL’s database statistics say that we are going to get more rows back per company. 

There are many business decisions like this that can impact statistics on SQL Server and thus your performance.  One really good example that I have that will cause parameter sniffing issues are locations in INVENTDIM.  Typically, in a warehouse, certain items belong in certain locations within the warehouse.  But, everyone needs to track inventory so they may setup a WIP location within the warehouse as things are being picked or packed as an example.  This means that WIP location has an inventory record for every inventory item you have in your database.  So, if a query runs with this location as the parameter when SQL compiles the statement, the plan will be entirely different then if one of your normal locations.   Another good example of this is that generic receiving location that every item was put in before it was inspected and put away. 

I’ve got one more example for you where something like this might occur.  Let’s say for 5 years you were averaging maybe 2-3 lines per Sales Order but 2 years ago you added new products to the business.  So, now you are averaging 20 lines per Sales Order.  If we archive some data out, the statistics will change and SQL will see that it’s going to get many more SALESLINE records per Sales Order and may change the query plan around.

 

What all of this means is that you should not only do performance tuning when you import lots of new data, say a new company, but also after you archive data out of the database.  Both events will cause significant change in database statistics and quite possibly your performance.

 

Rod “Hotrod” Hansen


Skip to main content