Fixing non deterministic schemas

If you’ve been importing XML schemas into SQL Server 2005 you might have encountered this error message

 

XML Validation: XML instances of the content model of type or model group '…' can be validated in multiple ways and are not supported.

 

The reason for this message is that the schema you tried to import into an XML schema collection contains a non deterministic content model. Non determinism in schemas is allowed by the W3C’s XSD specs (as long as the Unique Particle Attribution rule isn’t broken) but not by SQL Server 2005’s implementation of the standard.

 

Let’s look at a simple example

 

      <xsd:choice minOccurs="0" maxOccurs="unbounded">

            <xsd:element name="a" minOccurs="0" maxOccurs="unbounded"/>

            <xsd:element name="b" minOccurs="0" maxOccurs="unbounded"/>

      </xsd:choice>

 

If the XML document you’re validating contains five elements ‘a’ in a row, how should that be interpreted? Did we go five times through the choice particle, picking one time ‘a’ each time? Or did we go through the choice only once and pick five a’s at once? There are numerous ways to validate the instance.

 

Fortunately, this kind of content model is easy to fix. We can bring the occurrence constraints into the xsd:choice particle like this

 

      <xsd:choice minOccurs="0" maxOccurs="unbounded">

            <xsd:element name="a" />

            <xsd:element name="b" />

      </xsd:choice>

 

The set of valid instances will be exactly the same but this time the content model is deterministic.

 

There are however cases where it is impossible to fix the non-determinism issue while at the same time keeping the set of valid XML instances the same. The schema designer’s goal should be to come up with a content model that validates a superset of the XML instances valid according to the original schema.

 

For example, a content model like this one

 

<xsd:sequence minOccurs="1" maxOccurs="2">

            <xsd:element name="a" minOccurs="2" maxOccurs="4"/>

            <xsd:element name="b" minOccurs="0" maxOccurs="2"/>

</xsd:sequence>

 

could be replaced with this one

 

<xsd:choice minOccurs="2" maxOccurs="12">

      <xsd:element name="a"/>

      </xsd:element name="b"/>

</xsd:sequence>

 

All the instances valid against the original content model would be valid against this one also.

The problem is that a lot of instances that were invalid against the old content model would be successfully validated against this one (for example an instance that contains <b/> elements but no instance of <a/>). There are ways to reduce the number of such instances with CHECK constraints. For example, in this case we could reject all instances that contain no <a/>’s by checking that the return value of .value(‘count(/root[1]/a)’, int) is greater than zero.

 

In my own experience, content models such as this last one are few and far between. Most of the time, non-determinism issues are easily fixed.