Indexes Supporting Foreign Keys

Most developers building an OLTP system know that you should declare and enforce relationships between tables using Foreign Keys.  But what physical database structures should you implement to support those relationships? 

Every Foreign Key is a relationship from one or more columns on the table on which the Foreign Key is declared to a Key on the target table.  That’s why it’s called a “Foreign” key:  the columns refer to a Key on some other table.  In SQL Server, the target columns can be the Primary Key of the target table or a Unique Constraint.  In all cases there is a unique index on the target table.  But should you index the Foreign Key columns as well?

That would have some benefits:

  • Reduced cost for DELETE and UPDATE of the target key on the referenced Table
  • Reduced cost for lookups by the FK columns and joins between the two tables

And some costs:

  • Increased cost for all CRUD operations on the FK table
  • Increased database size
  • Developer/DBA time

That’s all well and makes me inclined to index the foreign keys, but IMO it’s not enough to support a proactive best-practice of supporting Foreign Keys with indexes because:

  • These are merely performance considerations
  • There are performance costs as as well as benefits
  • The performance impact will not always be material
  • If you properly test and monitor your workload you will discover any really helpful indexes anyway

However there is one more more benefit to indexing your foreign keys, and it’s not really a performance benefit.  Indexing the foreign keys reduces the scope and duration of locks required to perform DELETE (and key UPDATE) on the referenced table, and can prevent deadlocks.  I first learned about this for Oracle, where I learned almost everything I know about Oracle, from the incomparable Tom Kyte.  You can read about this here.  This applies equally in SQL Server.  Unindexed foreign keys reduce concurrency and make your database more prone to deadlocks.

I think the bottom line is that supporting your Foreign Keys with indexes is required to ensure correct behavior of your application if you perform frequent deletes on the referenced table.  It is also required if you update the target key on the referenced table.  You really shouldn’t be updating your keys, but even if you don’t some applications will include the key columns in the column update list in an UPDATE statement, which actually is updating the keys. 

So what do we mean fora Foreign Key to be supported by an index?

Definition:  A Foreign Key F defined on n columns a table T is supported by an index I on T just in case the first n columns of I are all foreign key columns.

It doesn’t matter what order the Foreign Key columns appear in in the index, so long as they comprise the leading “edge” of the index.  An important special case for this is where a table’s Primary Key supports a Foreign Key.  For Parent/Child relationships, Child tables are often modeled with a compound Primary Key with the Parent Primary Key as the leading edge of the Child Key.  Such a Primary Key also supports the Foreign Key, and so you only need a single index on the table.  For instance in AdventureWorks2008, the Primary Key of SalesOrderDetail supports the ForeignKey referencing SalesOrderHeader:

 CREATE TABLE Sales.SalesOrderDetail(
    SalesOrderID int NOT NULL,
    SalesOrderDetailID int IDENTITY(1,1) NOT NULL,
    . . .    
  CONSTRAINT PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 
   PRIMARY KEY (SalesOrderID, SalesOrderDetailID),
  CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID 
   FOREIGN KEY(SalesOrderID) REFERENCES Sales.SalesOrderHeader (SalesOrderID)
) 

In addition to supporting the Foreign Key without requiring  a secondary index on SalesOrderDetail, this pattern has the added benefit of placing all the SalesOrderDetail rows for a SalesOrderHeader onto a contiguous set of pages.  IMO this makes it vastly superior to a pattern you see sometimes where the child table will have only a single Primary Key column:

 CREATE TABLE Sales.SalesOrderDetail(
    SalesOrderID int NOT NULL,
    SalesOrderDetailID int IDENTITY(1,1) NOT NULL,
    . . .   
  CONSTRAINT PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 
   PRIMARY KEY (SalesOrderDetailID),
  CONSTRAINT FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID 
   FOREIGN KEY(SalesOrderID) REFERENCES Sales.SalesOrderHeader (SalesOrderID)
) 

Since this would require a non-clustered index on SalesOrderDetail.SalesOrderID to support the Foreign Key, and the SalesOrderDetail rows for an order might be spread across separate pages. 

Anyway, how do you identify what Foreign Keys in your database are not supported by indexes, and which ones need to be?  I would start by looking at all Foreign Keys that are not supported by an index, and evaluate each one to see how often deletes are performed on the referenced table, and how many rows are the the Foreign Key table.  The more often you delete from the referenced table, and the more rows in the Foreign Key table, the more you need to support the Foreign Key with an index.

Here’s a query that will find all the Foreign Keys without a supporting index, sort them by size of the Foreign Key table, and indicate how many rows have been deleted from the referenced table recently. 

 

   with fkc as
  (
  select 
    fk.name fk_name, 
    fkc.*, 
    COUNT(*) over (partition by fkc.constraint_object_id) fk_column_count
  from sys.foreign_key_columns fkc
  join sys.foreign_keys fk
    on fk.object_id = fkc.constraint_object_id
  ), ic as
  (
  select 
    object_name(i.object_id) table_name, 
    i.name index_name, 
    ic.*, 
    COUNT(*) over (partition by ic.object_id, ic.index_id) ix_column_count
  from sys.index_columns ic
  join sys.indexes i
    on i.index_id = ic.index_id
   and i.object_id = ic.object_id 
  where i.type in (1,2)  --clustered and nonclustered
  and i.is_disabled = 0
  and i.is_hypothetical = 0 --order by table_name
  ), matched_columns as
  (
  select 
    ic.object_id, 
    ic.table_name,
    ic.index_id, 
    ic.index_name,
    fkc.constraint_object_id,
    fkc.fk_name,
    ic.column_id, 
    ic.key_ordinal, 
    fkc.fk_column_count, 
    COUNT(*) over (partition by ic.object_id, ic.index_id) supported_column_count --count the supported columns
  from fkc
  join ic 
    on fkc.parent_object_id = ic.object_id 
   and ic.ix_column_count >= fkc.fk_column_count --exclude indexes with too few columns
   and ic.key_ordinal <= fkc.fk_column_count --include only as many columns as the FK has
   and ic.column_id = fkc.parent_column_id  --match the index columns to the FK columns
  ), supported_fks as
  (
    select distinct 
      object_id, 
      table_name, 
      constraint_object_id, 
      fk_name, 
      index_name supporting_index, 
      index_id supporting_index_id
    from matched_columns
    where supported_column_count = fk_column_count
  ), istats as
  (
    select object_id,
           SUM(leaf_delete_count + leaf_ghost_count) deletes
    from sys.dm_db_index_operational_stats(db_id(),null,null,null) istats
    where index_id = (select MIN(index_id) 
                      from sys.indexes i
                      where i.object_id = istats.object_id 
                      and (i.is_primary_key = 1 or i.is_unique_constraint = 1) )
    group by object_id
  )
  select 
    schema_name(schema_id) fk_table_schema,
    object_name(parent_object_id) fk_table,
    object_name(fk.referenced_object_id) referenced_table,    
    name constraint_name, 
    coalesce(istats.deletes,0) referenced_table_delete_count,
    (select SUM(rows) from sys.partitions where object_id = fk.parent_object_id and index_id in (0,1)) fk_table_rows
  from sys.foreign_keys fk  
  left join istats 
    on fk.referenced_object_id = istats.object_id 
  where fk.object_id not in (select constraint_object_id from supported_fks)
  order by fk_table_rows desc
  
  

David

dbrowne_at_microsoft