Ordering guarantees in SQL Server…


Ordering guarantees of queries in various context is a common source of confusion. For example, a common workaround to make the results from querying a view ordered is to introduce TOP 100 PERCENT and ORDER BY in the view definition. But this however does not guarantee order in the actual results sent to the client since the query optimizer will re-order operations to find more efficient query plans. Note that even though this topic applies to SQL Server 2005 most of the rules are valid for SQL Server 2000 too.

 

Here are the scenarios that guarantee ordering:


  1. If you have an ORDER BY in the top-most SELECT block in a query, the presentation order of the results honor that ORDER BY request
  2. If you have a TOP in the same SELECT block as an ORDER BY, any TOP computation is performed with respect to that ORDER BY. For example, if there is a TOP 5 and ORDER BY clause then SQL Server picks the TOP 5 rows within a given sort.  Note that this does not guarantee that subsequent operations will somehow retain the sort order of a previous operation. The query optimizer re-orders operations to find more efficient query plans
  3. Cursors over queries containing ORDER BY in the top-most scope will navigate in that order
  4. INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted
  5. SQL Server 2005 supports a number of new “sequence functions” like RANK(), ROW_NUMBER() that can be performed in a given order using a OVER clause with ORDER BY
  6. For backwards compatibility reasons, SQL Server provides support for assignments of type SELECT @p = @p + 1 … ORDER BY at the top-most scope.


Conor Cunningham

Query Optimizer Development Lead

Comments (18)

  1. Scott Thornburg says:

    Could you elaborate on statement #4. Are the identity values created in the sequence established by the ORDER BY? If a clustered index exists on the identity column, does this also then determine the order of insertion?

  2. sqletips says:

    Yes, the identity values will be generated in the sequence established by the ORDER BY. If a clustered index exists on the identity column, then the values will be in the logical order of the index keys. This still doesn’t guarantee physical order of insertion. Index maintenance is a different step and that could also be done in parallel for example. So you could end up generating the identity values based on ORDER BY clause and then feeding those rows to the clustered index insert operator which will perform the maintenance task. You can see this in the query plan. You should really NOT think about physical operations or order but instead think of a table as a unordered set of rows. The index can be used to sort rows in logical manner (using ORDER BY clause) efficiently.



    Umachandar

  3. Wesley B says:

    Hi,

    First of all, I know for a guaranteed order you have to apply order by to your select but out of curiosity I have a question.

    I have a table with a combined nonunique clustered index on an int field and a smalldatetime field (reporting purposes). When I select from the table (without order by) with locking the records are returned in the order of the clustered index key. However when running the query in a READUNCOMMITTED isolation level the records are returned in a different order. Although the order stays the same throughout different runs it is not ordered by the clustered index key. Issuing a DBCC DBREINDEX gives me the correct order when the database exists of 1 file, when I use multiple files for the database the order seems to be random again.

    The question is: how does READUNCOMMITTED affect the way that records are being returned?

    I guess it has something to do with multiple threads and the order of the data on disk but I’d like to know for sure.

    TIA

    PS. My blog has a test script

  4. sqletips says:

    You have to think of a table as an unordered set of rows. The results of a SELECT statement should also be considered the same. The only way to guarantee any order is to include an ORDER BY in the outer-most query. This will ensure that the client or consumer receives the rows in the specified order. This behavior is also according to ANSI standard where there are even stricter rules on using ORDER BY in a query.

    Having said this, the reason you are seeing different order for rows returned by the queries has to do with the execution plan. You can check the showplan output for example to see if the query without READUNCOMMITTED hint is performing an ORDERED scan using the clustered index. For example, you can traverse the clustered index using the index key order or IAM pages. And the scan using IAM pages returns pages in different order than the index scan. There are also other factors like parallelism, partitions, indexed views, and merry-go-round scans (single scan of a table is shared by different connections and typically seen on larger tables) which can affect how rows are read/returned. Hope this clarifies



    Umachandar

  5. Dis4ea says:

    Umachandar,

    Thanks for a explanation so far.

    But I’m still having difficulty understanding why exactly uncommitted isolation uses another method to reach the rows.

    Herewith the queryplans for both queries:

    Query plan with locking:

    |–Clustered Index Scan(OBJECT:([RSStaging183].[dbo].[tblClustered].[ixID]))

    Query plan without locking:

    |–Clustered Index Scan(OBJECT:([RSStaging183].[dbo].[tblClustered].[ixID]))

    As you can see the query plan produces the same output.

    If you wish there is a simple testscript on my blog to simulate the behaviour.

  6. sqletips says:

    I looked at your repro script and the reason is as I said the access methods for scanning a clustered index could be different. You can see this if you do the following:

    1. Increase the number of rows in the table to say 100K or 200K

    2. Run SET STATISTICS IO ON

    3. Run queries below:

    select * from tblClustered with(readuncommitted);

    select * from tblClustered with(readuncommitted) order by Id, MyDate;

    select * from tblClustered;

    select * from tblClustered order by Id, MyDate;

    You will notice that the io counters are different for the first SELECT which is an indirect indication of how the table scan is being performed. It is not conclusive but that is best we can do now using documented commands. Unfortunately some of this information is not exposed in the showplan output. I will post more details once I talk to the developer who owns this piece of code. :-)



    Umachandar

  7. Dis4ea says:

    Hehe, great Umachandar :-)

    I’ll check out your findings too.

    Thanks for all the effort already.

  8. SD says:

    Conor:-

    How about providing script example for scenariowise for 1-5 using any common dB like northwind or adventureworks That would for sure make this entry more worthful.While i had gone thru i had to minutely looking for the meaning as i was looking for the oredr of rows retuend in nested queries.

  9. ORDER BY has never been technically ‘allowed’ in a view definition. For

    example, the following code:CREATE…

  10. Questar says:

    Considering the example code below, are you saying in scenario 4 that correspondence between the two columns is not guarenteed?  If so, it seems to contradict this KB article…

    http://support.microsoft.com/default.aspx?scid=kb;en-us;273586

    CREATE TABLE #X (X int)

    CREATE TABLE #Y (Y int, N int IDENTITY(1,1))

    INSERT #X (X) VALUES (1)

    INSERT #X (X) VALUES (2)

    INSERT #X (X) VALUES (3)

    INSERT #Y (Y) SELECT X FROM #X ORDER BY X

    SELECT * FROM #Y ORDER BY Y

    DROP TABLE #X

    DROP TABLE #Y

  11. sqletips says:

    Can you expand your question little bit? The KB article does not contradict scenario #4. It says that identity values will be generated based on the ORDER BY clause in INSERT..SELECT. And to retrieve rows in a specific order from any table or query you have to always use SELECT with ORDER BY clause.

    Thanks

    Umachandar

  12. Questar says:

    Again in the context of the example code…

    Scenario #4 does not mention retrieving rows from table #Y.  It seems to say that identity values will be generated in a 1,2,3 order (so what other order would make any sense), but the rows retrieved from table #X could be inserted in 2,3,1 order.  Are you saying that the resulting rows in table #Y do not necessarily have the same value in the two columns, or are you simply saying that the rows of table #Y are not retrieved in any particular order without an ORDER BY clause?  If it’s the latter, why massively confuse the issue by involving the identity functionality?

  13. sqletips says:

    The other possible plan for identity generation is to perform it before the SORT operator(s). This happens when you do SELECT…INTO for example and the SELECT statement contains an ORDER BY clause. And another option is that the identity value can be generated at the time of insertion into the table at which point the order of the rows could be different. It is important to understand that these are set operations. Also, consider example below:

    CREATE TABLE #X (X int)

    CREATE TABLE #Y (Y int, N int IDENTITY(1,1), G uniqueidentifier DEFAULT(NEWSEQUENTIALID()))

    INSERT #X (X) VALUES (1)

    INSERT #X (X) VALUES (2)

    INSERT #X (X) VALUES (3)

    INSERT #Y (Y) SELECT X FROM #X ORDER BY X

    SELECT * FROM #Y ORDER BY Y

    SELECT * FROM #Y ORDER BY G

    DROP TABLE #X

    DROP TABLE #Y

    There is a default in the table that generates sequential GUIDs. But the values depend on which order the rows were inserted into the table. And this may or may not match that of the ORDER BY clause in the SELECT statement. So the final two SELECT statements need not necessarily produce the same results. As you can see, in this example the ORDER BY clause in the INSERT…SELECT only controls when the identity values on the table are generated not for the defaults. If you consider more complex schema elements that have say defaults with CLR UDFs that persist generates some sequential value the only way to control order is to insert the rows one by one.

    Hope this makes sense now why the emphasis is on identity value generation with INSERT…SELECT ORDER BY. It is because there are other cases where the ORDER BY doesn’t affect anything like computation of defaults for example.



    Umachandar

  14. Questar says:

    Thanks for the clarification.  Now that I understand the point of scenario #4 I see that it confirms the KB article.  I think scenario #4 would have been much clearer if it had mentioned column defaults as being different from identity values.

  15. Hi Conor,

    Here’s a question that’s triggered by a post in Allen White’s blog (http://www.sqljunkies.com/WebLog/marathonsqlguy/archive/2006/09/19/23557.aspx).

    CREATE TABLE #X (X int)

    CREATE TABLE #Y (Y int, N int IDENTITY(1,1))

    CREATE TABLE #Y2 (Y int, N int, N2 int IDENTITY(1,1))

    INSERT #X (X) VALUES (1)

    INSERT #X (X) VALUES (2)

    INSERT #X (X) VALUES (3)

    INSERT #Y (Y)

    OUTPUT INSERTED.Y, INSERTED.N INTO #Y2(Y,N)

    SELECT X FROM #X ORDER BY X

    SELECT * FROM #Y ORDER BY Y

    SELECT * FROM #Y2 ORDER BY Y

    DROP TABLE #X

    DROP TABLE #Y

    DROP TABLE #Y2

    The IDENTITY values for #Y will obey the specified ORDER BY, as stated in scenario #4. The order in which rows actually get inserted in #Y is not guaranteed, though. So what will happen to #Y2? Are the IDENTITY values for this table calculated as rows are inserted, in the order in which they are inserted (i.e. not guaranteed)? Or is the IDENTITY column in the target table of an OUTPUT clause also included in the guarantee as outline above?

    Best regards,

    Hugo Kornelis

  16. sqletips says:

    Hugo, here is the reply from Conor. Hope this clarifies your question. The jist of it is that the identity value generation order is guaranteed only on the INSERT INTO table. So any other assumption made on the order is risky and can lead to unpredictable behavior.

    >>

    There are no physical ordering guarantees on any insert operations in SQL 2005.  

    For example, consider the case when either/both of these tables are partitioned.  Even when they are not, there is nothing in the code guaranteeing any physical insertion order of the rows to the heap or any secondary index.

    In your example, the order by _is_ guaranteed to influence the order in which identity values are assigned to the identity function for the first table.  However, it does not guarantee any insertion order once those rows are generated .  It does not provide guarantees on the second table at all.  If such guarantees are needed, that’s a feature request for us to add syntax.

    >>



    Umachandar

  17. Umachandar, Conor,

    Thanks for the clarification. The answer is as I expected.

    Best regards,

    Hugo Kornelis