Maintaining Unique Indexes

Consider the following schema:

CREATE TABLE T (PK INT PRIMARY KEY, A INT, B INT)
CREATE INDEX TA ON T(A)
CREATE UNIQUE INDEX TB ON T(B)

INSERT T VALUES (0, 0, 0)
INSERT T VALUES (1, 1, 1)

Now suppose we run the following update statement:

UPDATE T SET A = 1 - A

This update statement affects the clustered index and the non-clustered index TA.  The plan is pretty much what you might expect:

  |--Clustered Index Update(OBJECT:([T].[PK__T__15502E78]), OBJECT:([T].[TA]), SET:([T].[A] = [Expr1003]))
       |--Compute Scalar(DEFINE:([Expr1016]=[Expr1016]))
            |--Compute Scalar(DEFINE:([Expr1016]=CASE WHEN [Expr1004] THEN (1) ELSE (0) END))
                 |--Compute Scalar(DEFINE:([Expr1003]=(1)-[T].[A], [Expr1004]=CASE WHEN [T].[A] = ((1)-[T].[A]) THEN (1) ELSE (0) END))
                      |--Top(ROWCOUNT est 0)
                           |--Clustered Index Scan(OBJECT:([T].[PK__T__15502E78]))

This is a typical narrow update plan.  The single clustered index update operator maintains both the clustered index and the non-clustered index TA.  The plan includes compute scalars that detect whether each row of the non-clustered index really needs to be updated.  I wrote about plans just like this one in this post.

Now suppose we run the same update statement but this time we modify column B:

UPDATE T SET B = 1 - B

Suddenly the plan is much more complex:

Rows   Executes
2      1        |--Index Update(OBJECT:([T].[TB]), SET:([PK1022] = [T].[PK],[B1023] = [T].[B]))
2      1             |--Collapse(GROUP BY:([T].[B]))
4      1                  |--Sort(ORDER BY:([T].[B] ASC, [Act1021] ASC))
4      1                       |--Filter(WHERE:(NOT [Expr1019]))
4      1                            |--Split
2      1                                 |--Clustered Index Update(OBJECT:([T].[PK__T__15502E78]), SET:([T].[B] = [Expr1003]))
2      1                                      |--Compute Scalar(DEFINE:([Expr1019]=[Expr1019]))
0      0                                           |--Compute Scalar(DEFINE:([Expr1019]=CASE WHEN [Expr1004] THEN (1) ELSE (0) END))
0      0                                                |--Compute Scalar(DEFINE:([Expr1003]=(1)-[T].[B], [Expr1004]=CASE WHEN [T].[B] = ((1)-[T].[B]) THEN (1) ELSE (0) END))
2      1                                                     |--Top(ROWCOUNT est 0)
2      1                                                          |--Clustered Index Scan(OBJECT:([T].[PK__T__15502E78]))

What's going on?  This time we updated a unique index.  The SQL Server storage engine enforces the uniqueness of the index at all times.  It does not permit the query processor to insert any duplicate rows into the index at any time.  The query processor, on the other hand, must ensure that an update statement succeeds so long as it does not leave behind any uniqueness violations upon its completion.  Based on this rule, the above update statement must succeed.

Let's consider what would happen if SQL Server simply tried to update the non-clustered index TB using the same simple plan that it used when we updated column A.   To execute this plan the server must scan the rows in the table and update them one at a time.  Suppose the server chose to update the row with PK=0 first.  In that case, it would attempt to change the value of B from 0 to 1.  Unfortunately, there is already a row in the index with B=1.  The storage engine would enforce the uniqueness of the index and the update would fail.  The update would similarly fail if the server chose to update the row with PK=1 first.  It would seem that there is no way to execute this update!

Fortunately, SQL Server has a solution.  The basic idea is simple.  Instead of updating the key columns of a unique index which can cause "false" uniqueness violations, the query processor reorganizes the update such that the non-key columns are updated instead.  This reorganization is implemented by the split, sort, and collapse operators.  Let's take a closer look at how this works.

In our example, we begin with two rows to update:

PK

B_old

B_new

0

0

1

1

1

0

The split operator transforms the updates into deletes followed by inserts:

Action

PK

B

Delete

0

0

Insert

0

1

Delete

1

1

Insert

1

0

Notice that as indicated by the above STATISTICS PROFILE output, we now have 4 rows.

The sort operator reorders the inserts and deletes by the key columns of the non-clustered index (in this case by column B).  If there is a delete and an insert that share the same key value, the delete sorts before the insert.  The results of the sort are:

Action

PK

B

Delete

0

0

Insert

1

0

Delete

1

1

Insert

0

1

The collapse operator combines adjacent delete and insert pairs that share the same key value into a single update:

Action

PK_old

PK_new

B

Update

0

1

0

Update

1

0

1

In this example, the 4 rows collapse back into 2 rows leaving only updates.  Notice that the updates no longer change column B (which could lead to a false uniqueness violation) but rather change column PK which is not a unique key of index TB and which cannot fail due to a uniqueness violation.  Note also that in general it is not necessarily the case that all deletes and inserts collapse into updates.  The results of the collapse operator may include any combination of inserts, updates, and deletes.

Finally, the index update operator executes the update operations output by the collapse operator.

Keep in mind that just because a query plan includes split, sort, and collapse operators does not mean that it may not result in an actual uniqueness violation.  It simply ensures that there will be no false uniqueness violations.  Moreover, SQL Server generates plans with split, sort, and collapse operators anywhere that there is a risk of a false uniqueness violation.  There may or may not be an actual uniqueness violation.  For example, the following update statement which (with the current data set) would not result in a false uniqueness violation generates a nearly identical plan:

UPDATE T SET B = B + 10

On the other hand, the following update statements can only update a single row.  SQL Server is clever enough to recognize that these statements cannot generate false uniqueness violations and generates simpler plans without the split, sort, and collapse operators:

UPDATE T SET B = B + 10 WHERE PK = 0
UPDATE TOP (1) T SET B = B + 10