SYSK 35: Performance comparison of WITH and TABLE variable queries


As many of you know, SQL 2005 now has a common table expression (CTE) WITH.  You can think of it as a temporary (non-persistent) view defined within the scope of an executing statement (see SQL help topic ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/27cfb819-3e8d-4274-8bbe-cbbe4d9c2e23.htm for more information).


I’m sure many of you can point to a number of benefits of CTE.  The question for me was – if I can do the same with a table variable and CTE, what’s better in terms of performance? 


I’ve created a simple table with four columns – identity primary key, group id, data id, and data value.  Populated it with the data, and ran the queries (see below) that returned values of the data column when data id = 4 or 17 for a randomly selected group id.


The answer may surprise you…  My tests show that the WITH statement is about 2 times slower!   On a 1 GHz/1Mb server populated with few thousands of rows, WITH statement took approx 1.2 sec vs. 0.6 sec using the table variable.  Same query with 2.5 million rows took a whopping 49 min 59 sec using WITH statement and 23 min 33 sec using Table variable.


Query using table variables (the faster way):



DECLARE @Temp TABLE ([pk_id] [bigint] NOT NULL,
[GroupID] [bigint] NOT NULL,
[DataId] [int] NOT NULL,
 [DataValue] [varchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL)


INSERT INTO @Temp
SELECT pk_id, GroupID, DataId, DataValue
FROM Table1
WHERE GroupID = @GroupID


SELECT
t1.DataValue Param4, t2.DataValue Param17
FROM @Temp t1
INNER JOIN @Temp t2 ON t1.GroupID = t2.GroupID
WHERE t1.DataId = 4 AND t2.DataId = 17


Query using WITH statement:



WITH ScenarioData AS
 (
      SELECT * FROM Table1 WHERE GroupID = @GroupID
)


SELECT
t1.DataValue Param4, t2.DataValue Param17
FROM ScenarioData t1
INNER JOIN ScenarioData t2 ON  t1.GroupID = t2.GroupID
WHERE t1.DataId = 4 AND  t2.DataId = 17


Comments (7)

  1. Gabe says:

    How does that compare with embedded SELECTs?

    SELECT

    t1.DataValue Param4, t2.DataValue Param17

    FROM (

    SELECT * FROM Table1 WHERE GroupID = @GroupID

    ) t1

    INNER JOIN (

    SELECT * FROM Table1 WHERE GroupID = @GroupID

    ) t2 ON t1.GroupID = t2.GroupID

    WHERE t1.DataId = 4 AND t2.DataId = 17

  2. BigJimInDC says:

    And how does the speed of this version compare to the others?

    ————

    SELECT

    t1.DataValue Param4,

    t2.DataValue Param17

    FROM (SELECT * FROM Table1 WHERE GroupID = @GroupID) t1

    INNER JOIN (SELECT * FROM Table1 WHERE GroupID = @GroupID) t2

    ON t1.GroupID = t2.GroupID

    WHERE t1.DataId = 4 AND t2.DataId = 17

    ————

    Or how about this one?

    ————

    SELECT

    t1.DataValue Param4,

    t2.DataValue Param17

    FROM Table1 t1

    INNER JOIN Table1 t2

    ON t1.GroupID = t2.GroupID

    AND t1.GroupID = @GroupID

    AND t2.GroupID = @GroupID

    WHERE t1.DataId = 4 AND t2.DataId = 17

    ————

    Unless I’m wrong, both will produce the same results.

    Anyway, my point is, for the sake of simplicity and clarity, the WITH statement is the clear winner and something that I’ve desired for years. IMHO, SQL2K5 should simply inline the WITH just like a C compiler would inline a function marked with the "inline" keyword. The given C function exists for the sake of clarity, while the inline keyword allows for technically faster execution of the compiled code. SQL2K5 could, and should, operate in the same manner.

  3. John Ingres says:

    I would call that a perf optimizer bug and it should be reported. Nice thing is…seem easy to reproduce.

    I suppose you took the cache into account?

    Could you postthe DDL for table1?

    Have you looked at the query plans?

    The optimizer may be simply choosing a bad plan because of bad statistics on table1 or bad indexes. If that is the case, the test would not be significant.

  4. irenak says:

    Response to John Ingres:

    1. My thinking on caching is that it would happen in both cases, so I ignored it for the sake of this test.

    2. Here is the table definition

    CREATE TABLE [dbo].[Table_1](

    [pk_id] [bigint] IDENTITY(1,1) NOT NULL,

    [GroupID] [bigint] NOT NULL,

    [DataId] [int] NOT NULL,

    [DataValue] [varchar](256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    CONSTRAINT [PK_Table_1] PRIMARY KEY CLUSTERED

    (

    [pk_id] ASC

    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]

    (

    [GroupId] ASC

    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    3. Yes, it’s certainly possible the optimizer is choosing a bad plan…

  5. irenak says:

    All good suggestions. The point of the original post was to point out possible performance difference. I suggest doing performance testing for your specific tables/queries.

  6. irenak says:

    I’d expect even worse performance…

  7. milo says:

    Table variables appear to be inefficient. I wonder how SQL2005 handles them internally. Have you done a comparison between CTEs and Temp Tables?  When I look at the execution plan for CTEs, it shows the CTE doing a scan of the source table twice.  For a Temp Table, it only scans the source table once.

    For the self join scenario, Temp Tables may still be the way to go.