Conor vs. MERGE and PRIMARY KEY collisions

Article
02/04/2009

MERGE is a new operation added in SQL Server 2008. It has all sorts of knobs, bells, and whistles. The primary value of this feature is to collapse multiple query statements into one query statement. Overall, this avoids the “overhead” necessary to run a query a few times and can be a big benefit, especially when running many small queries that do things like “if exists then update else insert”.

I received a natural question this week on the behavior of MERGE with respect to locks and transactions. The expressed expectation was that since MERGE was a single statement, it should be atomic transactionally and it should not be possible to get duplicate primary key errors. This expectation is not really correct – ultimately, the locking semantics for UPDATES, DELETES,INSERTS, and MERGE are more complex and require some knowledge of locking and query plans to fully understand.

Here’s the script. The initial part is created once and the second part could be run multiple times.

(running in default readcommitted, btw)

Setup:

DROP TABLE FOO

CREATE TABLE FOO (id INT PRIMARY KEY NOT NULL, name nvarchar(20) NOT NULL)

DELETE FROM FOO

INSERT INTO FOO VALUES(0, 'base')

Runtime portion:

DELETE FROM @fooChanges

INSERT INTO @fooChanges VALUES((SELECT MAX(id)+1 FROM FOO), 'induction')

MERGE FOO AS f USING @fooChanges as fc

ON ( fc.id = f.id )

WHEN MATCHED THEN

UPDATE SET name = fc.name

WHEN NOT MATCHED BY TARGET THEN

INSERT (id, name)

VALUES (fc.id, fc.name);

Effectively, MERGE is doing complex stuff under the covers for you. It is performing a join between the original table (FOO) and the table to be MERGEed (@fooChanges). It is an outer join, and the matching rows correspond to the case when the two rows match and the “not matched” rows in the join correspond to the “not matched” case in the query statement. The not matched rows have NULL on the non-matching side, and this is eventually converted into an action to INSERT instead of UPDATE in the stream of operations to be performed against the table FOO.

Here’s where this is happening in the query plan:

For non-MERGE UPDATEs and DELETEs, the common practice within most engines is to have a special “U” (Update) lock that is taken on the source rowset and then promoted to an “X” (exclusive) lock when the Update operation is actually performed later in the query tree. (Way back in the days, the read part would take an “S” (shared) lock that would not prevent other consumers from reading the row. Ultimately, this could cause all updates to fail in high transaction environments because two update queries could both get the S lock and not be able to promote to the X lock (since S locks can only be promoted if nobody else holds another S lock). Ultimately, this was solved by adding a U lock that prevented other UPDATEs but allowed other SELECTs.

In the case of MERGE WHEN NOT MATCHED, we actually don’t have a row yet, so there is no place on which to attach a lock (unless you use SERIALIZABLE isolation, which would take key range locks). So, no U lock means that there is no prevention against duplicate inserts (or PRIMARY KEY duplicate violations, in this case).

So, ultimately, the MERGE does not prevent key collisions for WHERE NOT MATCHED INSERTs.

Happy Querying!

Conor Cunningham

Conor vs. MERGE and PRIMARY KEY collisions

Additional resources