I gave a talk to the Austin-area PASS group last night on B-Tree indexing in SQL Server, and I received a question about whether to make clustered indexes UNIQUE or not (assuming the data is unique on the key columns). The expectation was that this should perform better when the index was UNIQUE.
Let’s back up a minute and see things from the perspective of the Query Processor. Someone comes in and creates an index on col1 of table1. They don’t make it UNIQUE but they put data in the column that is unique. Then you compile the query “SELECT DISTINCT col1 from Table1”. The QP generates a plan that remains valid even if other data is inserted while that plan is in the procedure cache. So, it can’t assume anything about the uniqueness of that column because you could come in and insert a value that is non-unique. If, on the other hand, you created a UNIQUE constraint/index, then things are very different! Since the generated plan is valid as long as the schema definition for the table remains the same, the QP knows that the definition of the table precludes any duplicate values from being inserted. So, it can generate a plan that doesn’t bother to do any grouping (DISTINCT is implemented using the a group by operator in SQL). Go ahead and try it – you’ll see the plan difference.
In my view, this is really more of a question about your data model – if you know something is distinct, you can get the system to enforce this rule for you and make sure that your application behaves correctly. So, when defining your tables, you should have a holistic view of how the tables relate to each other so that you can identify where you should or should not have constraints (or indexes – however you want to define them).
One case where I see customers sometimes explicitly avoiding the definition of constraints is in data warehouses. If your data is read-only and you already “know” that your data is unique, you may want to skip the space overhead or time overhead of having indexes to enforce these constraints. Now, you still have the issue with the Optimizer not being able to do fancy query rewrites based on uniqueness, but your average data warehouse query is often expensive and rarely do these kinds of rewrites apply in any meaningful way that would impact query performance (other factors dominate).