Why is there a 128-byte limit on the primary key for tables with a Primary XML Index?

One error that you may run into when attempting to create a Primary XML Index is the following:

Msg 6336, Level 16, State 201, Line 1
Maximum size of primary index of table 'StorageTable' is 900 bytes. CREATE XML INDEX requires that such size should be limited to 128 bytes

It is pretty self-explanatory on what you have to do (reduce the total size of the key columns of the base table), however what it does not tell you is why.  The reason is because SQL Server 2005 has a 900-byte limit for key columns of any index.  When we create the Primary XML Index (essentially a "hidden" table which pre-shreds the XML data so it doesn't have to be done at run time), the key columns in the XML index are a superset of the primary keys in the base table.  In other words, we add some key columns to the XML index that we need in order to provide efficient lookups on the index.  So, now you are probably asking yourself "well, what extra keys do they need?"  Well, I'll show you.

To start with, create a table which a clustered primary key and an xml column:

 create table StorageTable2 (
 storageKey varbinary(128) primary key, 
 storageValue xml not null
   )

Now, create the Primary XML Index on it:

 create primary xml index i_xml on StorageTable2(storageValue)

Ok, now we have all of the objects we need to discover what keys are added.  We can now query the system catalog to figure out what the key columns are on the clustered index which is our XML index:

 select convert(varchar(5), c.name) as [column_name], 
      convert(varchar(12), t.name) as [type],
     c.max_length as [length],
       ic.key_ordinal
from  sys.internal_tables it
join  sys.indexes i on (i.object_id = it.object_id)
join   sys.index_columns ic on (it.object_id = ic.object_id and i.index_id = ic.index_id)
join  sys.columns c on (c.object_id = it.object_id and c.column_id = ic.column_id)
join    sys.types t on (c.system_type_id = t.system_type_id)
where   it.parent_id = object_id('StorageTable2')
order by ic.key_ordinal asc
 column_name type         length key_ordinal
----------- ------------ ------ -----------
pk1         varbinary    128    1
id          varbinary    900    2

I know what you are thinking.  "But you just said the sum of the length of the key columns has to be less than or equal to 900 bytes!  That looks like 1028 bytes!"  You know what, you are precisely correct.  For various historical reasons that I am not going to get into right now, the length of the "id" column (think of this guy as a unique identifier for each node in an XML instance which also provides document ordering) is constrained within all of the XML runtime code to actually be smaller than 900 bytes.

That isn't the whole story though, there are other nonclustered indices that you can create on the primary XML index.  We call each of these a "Secondary XML Index" and they come in 3 different varieties.  We can create each of these guys on the primary XML index that we have already created:

 create xml index i_xml_path on StorageTable2(storageValue)
 using xml index i_xml
   for path
create xml index i_xml_value on StorageTable2(storageValue)
 using xml index i_xml
   for value
create xml index i_xml_prop on StorageTable2(storageValue)
 using xml index i_xml
   for property

And we can now query for all of the key columns of these indexes:

 select convert(varchar(12), i.name) as [index_name],
       convert(varchar(5), c.name) as [column_name], 
      convert(varchar(9), t.name) as [type],
      c.max_length as [length],
       ic.key_ordinal      
from    sys.internal_tables it
join  sys.indexes i on (i.object_id = it.object_id)
join   sys.index_columns ic on (it.object_id = ic.object_id and i.index_id = ic.index_id)
join  sys.columns c on (c.object_id = it.object_id and c.column_id = ic.column_id)
join    sys.types t on (c.system_type_id = t.system_type_id)
where   it.parent_id = object_id('StorageTable2')
order by [index_name], ic.key_ordinal asc
 index_name   column_name type         length key_ordinal
------------ ----------- ------------ ------ -----------
i_xml        pk1         varbinary    128    1
i_xml        id          varbinary    900    2
i_xml_path   hid         varchar      900    1
i_xml_path   value       sql_variant  8016   2
i_xml_prop   pk1         varbinary    128    1
i_xml_prop   hid         varchar      900    2
i_xml_prop   value       sql_variant  8016   3
i_xml_value  value       sql_variant  8016   1
i_xml_value  hid         varchar      900    2

Again, you will see that the actual key column length as reported by metadata seem to be much larger than our 900 byte limit.  As with the "id" column, many of these columns in the primary xml index are actually constrained by runtime code not to overflow their respective sizes, which are actually much smaller than reported by metadata (for example, we only store 128 bytes in the "value" column).  The actual size limitations for all of these columns are actually quite carefully chosen to allow a wide variety of XML documents (for example, "hid" and "id" can both be larger for more complex documents) while at the same time allowing reasonable sizes for the primary key of the base table.  So if you ever hit this error message, now you will know our precise reasoning: The indexes that we create internally contain additional key columns which are utilized to provide an efficient XQuery implemenation.