To set up this scenario, run the script below:

USE tempdb

GO

IF OBJECT_ID ('test1') IS NOT NULL DROP TABLE test1

GO

CREATE TABLE test1 (c1 tinyint, c2 smallint)

DECLARE @x int

DECLARE @msg varchar(1000)

SET @x = 1

SET NOCOUNT ON

BEGIN TRAN

WHILE (@x <= 1000000)

BEGIN

INSERT INTO test1 (c1, c2) VALUES (@x % 255, CASE WHEN @x % 1000 = 500 THEN 1000 ELSE @x % 1000 END)

IF @x % 5000 = 0

BEGIN

COMMIT TRAN

BEGIN TRAN

SET @msg = 'Inserted ' + CONVERT(varchar(20), @x) + ' rows ...'

RAISERROR (@msg, 0, 1) WITH NOWAIT

END

SET @x = @x + 1

END

WHILE (@@TRANCOUNT > 0) COMMIT TRAN

SET NOCOUNT OFF

GO

CREATE INDEX idx1 ON test1 (c2)

UPDATE STATISTICS test1 WITH FULLSCAN

DBCC SHRINKDATABASE (db2)

-- DBCC SHOW_STATISTICS ('test1', 'idx1')GO

The hypothetical scenario here is that your users complain that the first of the two queries below runs very slowly. You find that if you force SQL to use index [idx1] with an index hint, the query executes much more quickly. But why should you change your app to compensate for what seems to be a SQL Server bug? ðŸ™‚

USE db2

GO

SET STATISTICS PROFILE ON

SET STATISTICS TIME ON

-- Query #1 (slow)

SELECT c1, c2 FROM test1 WHERE c2 = 500

-- Query #2 (fast)

SELECT c1, c2 FROM test1 WITH (INDEX = idx1) WHERE c2 = 500SET STATISTICS TIME OFF

SET STATISTICS PROFILE OFFGO

To me, this suggests two questions: 1) Why isn't the fast plan chosen by default? 2) Are there any possible solutions that don't require modifying code to supply an index hint?

Despite appearances, this actually isn't a bug. Here's the explanation:

This table demonstrates uneven data distribution. The [c2] column holds values ranging from 0 to 999. There are exactly 1000 rows with the same [c2] value for each of the integers between 0 and 999. However, the value 500 is an anomaly; there are 0 rows with this value. Run the following query to show the number of rows with each [c2] value and you'll see that there are none with [c2]=500:

SELECT c2, COUNT(*) AS cnt FROM test1 GROUP BY c2

c2 cnt

------ -----------

...

498 1000

499 1000

501 1000

502 1000

...

When the server builds a histogram for the statistics on a column, it attempts to intelligently select range endpoints so that a given step of the histogram represents a range of values with similar density. If you run DBCC SHOW_STATISTICS on the index you can see the histogram:

-- DBCC SHOW_STATISTICS ('test1', 'idx1')

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

------------ ---------- ------- ------------------- --------

0 0.0 1000.0 0 0.0

499 498000.0 1000.0 498 1000.0

503 2000.0 1000.0 2 1000.0

1000 496000.0 1000.0 496 1000.0

RANGE_HI_KEY is the value that sets the upper bound for the range of values represented in each histogram step. The lower bound is RANGE_HI_KEY+1 for the preceding step. For example, consider this row:

499 498000.0 1000.0 498 1000.0

This indicates that there are 498000 rows (RANGE_ROWS) with a value between 1 and 498, inclusive. There are 1000 rows with the exact value of 499 (EQ_ROWS). AVG_RANGE_ROWS tells us that the typical value that falls within the range 1-498 shows up in 1000 rows. Now consider the next step in the histogram:

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

------------ ---------- ------- ------------------- --------

503 2000.0 1000.0 2 1000.0

This indicates that there are 2000 rows (RANGE_ROWS) with a value between 500 and 502, inclusive; and there are 1000 rows (EQ_ROWS) with the exact value of 503. The typical c2 value in the range 500-502 shows up average of 1000 times in the table. Recall what we know about the actual number of rows in this range:

c2 cnt

------ -----------

...

499 1000

501 1000

502 1000

503 1000

...

This histogram step describes its range of c2 values exactly. What the histogram doesn't tell us, however, is exactly how many times a particular value in the 500-502 range appears. Remember that each histogram step only summarizes the values within a given range; it doesn't tell you exactly how many of each particular value there are (that is, unless the table's domain is small enough for every distinct value to have its own step in the histogram). SQL knows it needs to search for rows where [c2]=500, so it locates this step in the statistics' histogram, finds that the typical value in this range shows up in 1000 rows, and uses this as its rowcount estimate.

This is a key point: if a value being searched for falls in the middle of a histogram step, the AVG_RANGE_ROWS for that step is used to estimate the number of matches that will be found. SQL is always optimistic and assumes that the value it is searching for is actually one of the range rows. In this case that assumption is incorrect. The QP ends up overestimating the number of rows that will be returned and choosing a table scan when an index seek would have been more efficient.

So if this isn't a bug, what could you do to fix the problem? One solution would be to make the nonclustered index a covering index. Alternately, you could make [idx1] a clustered index. And, if you're using SQL 2005, you could use a plan guide with a USE PLAN hint to force the fast plan without changing the query text. All of these solutions should cause SQL to select the fast index seek plan. The bad cardinality estimate will still be there, but the perf problem won't.

(Update: A blog post from Jack Li describing a similar problem can be found here.)

Bart,

Thanks for the article however there is a question:

Even if the QP ends up overestimating the # of rows in this case, why is there a reason it should go for a table sacn anyways. I see this is a large table and if there exists a column index (which it does in this case) shouldn’t it use it anyways? I could understand it choosing a table scan if the resultset was going to be considerably huge as compared the the table.

Rgds.

Someone,

The idea that an index seek is always better than a table scan is a common misconception. In fact, this is only true if the index is a covering index. In the case of a non-covering index, sometimes an index seek will be faster, and sometimes a table scan will be faster. The reason boils down to a simple fact: with today’s disk drives, sequential I/O is much, much faster than random I/O. Each bookmark lookup requires a repositioning of the disk head, which costs on the order of 5-10ms. A bookmark lookup (read: disk head movement) is required for roughly every qualifying row in the case of a seek against a non-covering index; in contrast, a table scan can be satisfied with (relatively) quick sequential I/O requests, most of which should be able to be satisfied without waiting to reposition the drive head.

For a little more information on this topc, check out:

http://blogs.msdn.com/queryoptteam/archive/2006/04/12/575241.aspx

If this isn’t clear, let me know.

HTH,

Bart

Bart,

I struggle with queries on tables similar what you described here. Therefore I found this blog quite interesting. I also read your chapter in Ken Hendersons "Practical Troubleshooting" book.

From that article I learned that for stored procedures you can avoid parameter sniffing by using local variables. You get an query plan which only takes into account the column density but not the parameter values.

This kind of ‘stable’ plan would be the solution to the kind of query issues I have. But my application doesn’t use stored procedures. Is there any other way getting this kind of plan with other programming technique like sp_executesql, prepared queries, or ad hoc queries?

A typical example where I would like to use a ‘stable’ plan are "ascending columns", where you often find values in the table which are not yet in the statistics.

For these values the optimizer will choose the wrong access path. This means using a RECOMPILE hint will not help in this case. A OPTIMIZE FOR hint would be fine, but it requires that I know a ‘good’ value at the time of development (which I don’t). I would need something like a OPTIMIZE FOR ANY hint telling the optimizer that I need a ‘stable’ plan.

Rgds.

Alex,

One way would be to use the local variable switch-a-roo trick in your ad hoc queries. For example, rather than sending the query:

EXEC sp_executesql N’

select * from mytbl where column = @p1

‘, N’@p1 int’, @p1 = 123

you could send:

EXEC sp_executesql N’

declare @v1 int — new line

set @v1 = @p1 — new line

select * from mytbl where column = @v1 — note, @v1 instead of @p1

‘, N’@p1 int’, @p1 = 123

There are probably also some ways to do this by embedding your param within a constant expression that SQL isn’t smart enough to fold. To me this seems even less elegant than the local variable thing, though; I guess an opinion about which of these is uglier depends on your personal sense of aesthetics. Here’s one example that takes advantage of the fact that SQL doesn’t try to determine the output of scalar functions at compile time:

create function RetVal (@p1 int) returns int AS begin return @p1 end

GO

/* SQL will use avg. col. density for CE here because it’s not quite smart enough to

** realize at compile time that dbo.RetVal (123) will always return 123. */

select * from mytbl WHERE column = dbo.RetVal (123)

Having said all that, I still think that the best overall approach to most instances of this problem is probably OPTIMIZE FOR (maybe not for your specific requirements, but in the general case). You don’t necessarily have to know the value to include in the hint at development time since you can apply such a hint via plan guide without changing the query text or the application. Of course, that assumes that you have a good DBA managing the system with both the time and the expertise to identify problem queries and craft plan guides to solve them. I definitely appreciate that this can’t be assumed in all cases, especially if you’re an ISV selling your solution to a lot of end customers that will have wildly varying SQL DBA skills.

Hope this helps some!

Bart

Thanks a lot, it did help!

Trying both suggested methods on AdventureWorks.Sales.Customers I found an interesting difference between both methods:

While the former ‘local variables’ method creates the same nice execution plans as when executing the normal statement, the later method using a scalar function has a draw back: the optimizer is more dumb and creates an ugly plan. The plan does not use the predicate (TerritoryID= @l_TerritoryID) in the clustered index seek, but uses a filter operator afterwards.

So it is not only a question of aesthetics but also of performance.

The statements I used are:

EXEC sp_executesql N’

DECLARE @l_TerritoryID int,

SET @l_TerritoryID = @TerritoryID

TerritoryID = @l_TerritoryID’

, @CustomerID = 25000, @TerritoryID = 2

SELECT TerritoryID, CustomerID, CustomerType

WHERE CustomerID > dbo.RetVal (25000)

AND TerritoryID = dbo.RetVal (2)

Regards,

Alex

Alex,

Glad one of the solutions worked for you. FYI regarding another possible path (since you mentioned "ascending columns"), check out http://blogs.msdn.com/ianjo/archive/2006/04/24/582227.aspx. If you are on SP1 or RTM, though, you may also want to apply SP2 to get the fix for the issue described in http://support.microsoft.com/kb/922063 before enabling those trace flags.

Also, the plans for the two queries you provided are actually more or less identical and should have similar perf characteristics. If you look closely at the "Clustered Index Seek" operator for the seemingly "faster" plan you’ll see that it has a residual WHERE that performs the filter on [TerritoryID]. This is functionally equivalent to the other plan — the only difference is that that plan performs the same in-memory filter operation via an explicit Filter operator.

HTH,

Bart

Bart,

actually I read the KB about ascending columns before and used the same wording. These trace flags described there are a littlebit mysterious to me, and the situation where they apply are somewhat special (covering index with the ascending column at the first position).

With the two different queries and their respective plans: both do the filter in memory, but it seem to me that the explicit Filter operator comsumes more CPU then the residual WHERE in the "Clustered Index Seek" operator does. (At least according to the measurements on my AdventureWorks database usint the SET STATISTICS options.)

Regards,

Alex

Yep, you’re right about the relative costs — the first query is a bit cheaper than the second. There is a little bit of CPU overhead for each row that has to be shuttled between two operators. (That cost is the very reason that the Seek operators were enhanced to be able to perform a residual WHERE as a secondary operation following the seek.) My point was just that the plans are more alike than they are different; both, for example, do a seek that returns about 4500 rows, then both apply a filter to those 4500 rows to filter the final resultset down to 2 rows. The SET STATISTICS PROFILE ON output for the first query could be a bit misleading because it says that the seek returned 2 rows. In a sense, that isn’t really true — it masks the fact that the seek only filtered the resultset down to 4500 rows, just like the second query, and required a filter as a subsequent operation to trim the resultset down to 2 rows.

Thanks,

Bart

Bart ,

What does Distinct_range_rows means …

therefore what would this mean

Histogram Steps

RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

1 0 1 0 1

78084 78081 1 78081 1

78085 0 1 0 1

A statistics histogram summarizes the data within a single column. DISTINCT_RANGE_ROWS is the number of distinct values (note:

values, not rows) within a particular histogram step. (The count excludes the upper-most value of the range b/c the presence or absence of a row with the range's upper-most value is given explicitly by EQ_ROWS). Given your example:RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS

———————————————————————————————

1 0 1 0 1

78084 78081 1 78081 1

78085 0 1 0 1

Taking each row in turn, starting with the first row, this means:

1. The range of column values from -infinity to 1 has no rows with a column value less than 1, and exactly one row with a column value equal to 1.

2. The range of column values from 2 to 78084 has 78081 rows (from RANGE_ROWS) with column values less than the value 78084. This set of 78081 rows contains 78081 distinct column values (from DISTINCT_RANGE_ROWS). Note that RANGE_ROWS == DISTINCT_RANGE_ROWS, which indicates that the row values are unique. There is exactly one row equal to the range's HI_KEY column value of 78084.

3. The range from 78085 to 78085 has exactly one row equal to 78085.

You may note that DISTINCT_RANGE_ROWS tells you something about the uniqueness of the rows represented by each histogram step. If you calculate RANGE_ROWS / DISTINCT_RANGE_ROWS, you'll get the average number of "duplicate" rows for each distinct column value in the range. Put another way, DISTINCT_RANGE_ROWS allows you to calculate the range's selectivity or density.