I have seen a number of (good) arguments that some issues with Dynamics NAV on SQL2005 are caused by a bad query optimizer in SQL2005, and that the SQL teams need to fix this. After all, some issues we see on SQL2005 did not exist in SQL2000. This blog tries to argue that SQL2005 is maybe more clever than SQL2000, and tries to give some input into the discussions about whether SQL2000 is more clever than SQL2005, and whether there is anything to fix in SQL2005 regarding this specific issue.
The scenario belows is the situation where the reuse of a cached query plan causes a Dynamics NAV client to hang while it is browsing forms.
Here is a repro scenario which will show why Dynamics NAV ends up causing a clustered index scan on SQL2005, while the same scenario on SQL2000 did not cause any such scans. It is based on a W1 5.00 demo database, and it requires a Solution Developer's license to run it. Run the steps on a SQL2000 and SQL2005 database and you will see where the differences are between these two platforms:
1. Create a new G/L Account, No 1105
2. Create 50.000 new records in table 17. For this purpose, it doesn't matter if you post these entries or just create a codeunit to insert the records.
3. Run table 17 from Object Designer, and change the G/L Account No. to 1105 for the first 3, and the last 3 entries
4. On SQL Server, update statistics on this table:
update statistics [CRONUS International Ltd_$G_L Entry]
5. Run Dynamics NAV with Maximized forms.
6. In Dynamics NAV, go to "Chart of Accounts" and drill down on the new account 1105 and you should see 6 entries. Make sure to place the cursor on the first entry. Then close the drill-down to go back to the "Chart of Accounts".
7. On SQL Server, run DBCC FREEPROCCACHE. This will clear out any cached query plans.
8. Start a profiler trace - include the following events (on top of the default ones)
On SQL2005: Performance:Showplan Text, on SQL2000: Performance:Execution Plan
9. In Navision, drill down on account 1105. Then move the cursor with arrow-down, until you get to the last entry. Then move back up to the top again with arrow-up.
10. Stop the profiler trace.
On SQL2005, you should see one of the last entries causing a relatively large number of reads. In my tests 2079 reads. This is the offending query. The same query on SQL2000 causes much fewer reads. In my tests 126 reads.
The query looks like this:
SELECT * FROM "W1500"."dbo"."CRONUS International Ltd_$G_L Entry" WHERE (("G_L Account No_"=@P1)) AND "G_L Account No_"=@P2 AND "Posting Date"=@P3 AND "Entry No_"<@P4 ORDER BY "G_L Account No_" DESC,"Posting Date" DESC,"Entry No_" DESC ',@p3 output,@p4 output,@p5 output,N'@P1 varchar(20),@P2 varchar(20),@P3 datetime,@P4 int','1108','1108',''2007-12-31 00:00:00:000'',52761
Notice that the last parameter value is 52761. So the part of the query to focus on here, in fact reads:
WHERE "Entry No_" < 52761
Then take a look at the execution plan. SQL 2005 uses the index [CRONUS International Ltd_$G_L Entry$0], which is the clustered index ("Entry No_"). SQL2000 uses the index [CRONUS International Ltd_$G_L Entry].[$1], which is the index which begins with "G_L Account No_". So based on this query it is not strange that SQL2005's plan is causing many more reads that SQL2000's plan.
Here is an important point to make: Neither SQL2000 or SQL2005 compiled the query plan for this query. You can see by the presense of SP:CacheHit events in the profiler trace, that the plan was taken from the plan cache. So in order to find out why the two versions of SQL makes different plans, we need to go to the place where the plan was made.
Go to the SP:CacheHit event and look at the data. Then go backwards in the trace until you find the SP:CacheInsert event with the same data. This is the place where the query plan was made. The query in this place looks like this:
SELECT * FROM "W1500"."dbo"."CRONUS International Ltd_$G_L Entry" WHERE (("G_L Account No_"=@P1)) AND "G_L Account No_"=@P2 AND "Posting Date"=@P3 AND "Entry No_"<@P4 ORDER BY "G_L Account No_" DESC,"Posting Date" DESC,"Entry No_" DESC ',@p3 output,@p4 output,@p5 output,N'@P1 varchar(20),@P2 varchar(20),@P3 datetime,@P4 int','1108','1108',''2006-12-31 23:59:59:000'',1
This time, the last parameter value is 1 (not 52761)! So this time, the part of the query to focus on is:
WHERE "Entry No_" < 1
Remember that "Entry No_" is also the clustered index.
So here is the question: What is the best possible query plan for this query? And I think the answer is easy for this scenario: Use the clustered index to "scan" this one record! The number of Reads in the trace should also confirm this. In my tests, SQL2005 did 21 reads. SQL2000 did 245 Reads.
So in this case, SQL2005 makes a better plan than SQL2000!
The way that query plans are cached and reused has not changed between SQL2000 and 2005. The following points are valid for both versions:
1. When a query plan is designed, SQL will take the parameter values into consideration (In this example, whether the last parameter is 1 or 52761). This is also called parameter sniffing.
2. When a query plan is reused from cache, the parameter values are NOT taken into consideration. The Query that the plan is valid for is converted into a hash-value. SQL simply looks in the plan cache if a plan exists for that hash-value, and then reuses the plan if there is. If SQL also had to revalidate the plan against the current parameter values, then this would to some extend negate the whole purpose of reusing cached plans (performance).
3. SQL's query optimizer does not have any kind of risk-assessment when it designs a query plan. There are no mechanisms in place to consider "If I put this plan into cache, and it was reused with other parameters, what is the potential damage?"
These behaviours are fundamental to current and previous version of SQL, and most likely to future versions as well.
So, for this scenario we can see that:
- When the plan was made, SQL2005 made the most optimized plan.
- The behaviour of caching plans and reusing them are the same on both SQL2000 and SQL2005.
Without going into too many details here about how to troubleshoot a situation like this, there are various ways to handle it. The main methods for Dynamics NAV are:
- Index hints:
In this situation, if the query had included an index hint on the $1 index ("G_L Account No_"), then SQL2005 would not have chosen the clustered index as it did. The behaviour would have been like on SQL2000, and the problem query (2079 reads) would not have happened. For more details about Index Hinting in Dynamics NAV, check thhe blog "Index Hinting in Platform Update for Microsoft Dynamics NAV 4.0 SP3 KB940718".
- Recompile hints
Adding a Recompile hint to a query is a way to tell SQL Server to make a new query plan, and not take one from cache. In this way you may get query plans that are better optimized for the current parameter values, but it also adds an overhead to SQL Server because making a new query plan always takes longer than re-using a cached one.