XML Data Type Limitations

I'd like to take some time today to explain some of the seemingly arbitrary limits placed on the XML data type, specifically those related to ID/IDREF validation, complex XML Schema types, the depth limit for XML data, and the enigmatic "XSD schema too complex" error.

ID/IDREF Validation
If your typed XML document has attributes of type xs:ID and/or xs:IDREF, SQL Server will enforce referential integrity on these attributes: Within a given document, no two attributes of type xs:ID may have the same value, and all attributes of type xs:IDREF must have the same value as some attribute of type xs:ID.

This requires the validator to remember which ID values it's seen. Since the validator does only one pass, it must also remember any IDREF values for which it has not yet encountered a corresponding ID. It is thus possible to construct an XML document which requires an arbitrary amount of memory to validate correctly. In order to prevent denial of service attacks, we capped the amount of memory available for this purpose at one megabyte. If you try to validate a document which exceeds this limit, validation will fail with error 6969:

ID/IDREF validation consumed too much memory. Try reducing the number of ID and IDREF attributes. Rearranging the file so that elements with IDREF attributes appear after the elements which they reference may also be helpful.

There's no simple way to describe the precise conditions necessary to produce this error, but the relevant factors are the number of ID and forward-referencing IDREF attributes and the lengths of their values. The cap is the same for the 32-bit and 64-bit versions of SQL Server, so there are some documents which will validate on the 32-bit version but fail to validate on the 64-bit version due to the larger pointer size.

Complex XML Schema Types
When submitting a schema to be added to an XML Schema Collection, you may see message 6998:

Type or content model '[TypeName]' is too complicated. It may be necessary to reduce the number of enumerations or the size of the content model.

When a type is needed for validation, the validator loads its definition from metadata and compiles it into a format suitable for quick validation. In order to prevent any one type from using too much memory, SQL Server caps the size of a compiled type at one megabyte. SQL Server compiles all types and performs this check when the schema is imported in order to avoid accepting types which exceed the limit.

As with the ID/IDREF limit, there's no simple way to describe precisely the conditions necessary to exceed this limit. Having a large number of attributes, a content model with many particles (xs:sequence, xs:choice, xs:all, xs:element, or xs:any), or many enumeration facets are the most likely causes. Note that the properties inherited from the base type or imported via xs:group or xs:attributeGroup references are expanded in the compiled type definition, so it's possible for a type to exceed the limit just by adding a few attributes to its base type, if the base type is near the limit.

The types of child elements, however, do not contribute to the limit. For example, you should have no problem defining a type whose content model contains several child elements, each of which has a different type whose compiled representation is 500K. If you find yourself running up against this limit, it may be helpful to split the type's properties between two or more sub-types.

"Schema Too Complex"
When adding a schema to an XML Schema Collection, you may occasionally run into error 2362:

XSD schema too complex.

This is somewhat misleading; what it actually means is that SQL Server is running low on stack space. We rely heavily on recursion for parsing and semantic validation of XML Schema documents, and in rare (and usually intentionally pathological) cases, this presents a danger of stack overflow, which would kill the process and crash the server. To prevent this, we check the remaining stack space at recursion points and abort the transaction if it's low enough to cause concern.

If you encounter this error and your schema is not intentionally pathological, you may be able to make some semantically insignificant changes that will allow SQL Server to process your schema. The most common causes of recursion are nesting and forward references. If you have several anonymous types nested in the Russian-doll style, it may help to unnest them and move the local element or type definitions up to the global level. Additionally, it may help to rearrange schema components to eliminate forward references--that is, try to make sure that component definitions precede their references in document order.

XML Depth Limit
Finally, SQL Server limits the depth of any XML instance, typed or untyped, to 128 levels. Conversion of a string with deeper nesting to XML will fail with error 6335:

XML datatype instance has too many levels of nested nodes. Maximum allowed depth is 128 levels.

We impose this limit in order to guarantee that we will be able to create an XML index for any XML column. In SQL Server, The primary key of an XML index consists of the primary key of the base table and the ordpath. The maximum length for an index key in SQL Server is 900 bytes, so the combined length of the base table's primary key and the ordpath must be 900 bytes or less. We decided to impose a limit of 128 bytes on the primary key of the base table, leaving 772 bytes for the ordpath. Based on the properties of ordpath, we decided that 128 levels would be a good upper limit to ensure that the ordpath never exceeds the maximum size.

We'd like to know if any of these limitations are causing problems, so please leave a comment if you've run into any of these. That the depth limit causes problems for SHOWPLAN_XML is a known issue, but we're not aware of practical problems with any of the other limitations.