Dealing with the Limitations of XSD Content Models

There was recently a question on an internal mailing list about how to model RSS 2.0 with XML Schema. The short answer is that it can’t be done. The problem lies with the content model of the <channel> element, which consists of three elements which must appear exactly once, several elements which may optionally appear once, and one element, <item>, which may appear any number of times. Additionally, the items may appear in any order.

You can’t model this with <xs:all>, because its children may not have maxOccurs > 1. And you can’t model it with <xs:sequence>, because that requires that items appear in a particular order. Ideally, the best solution would have been to take the limitations of XML Schema into account when designing the RSS specification. For example, we might require the children to appear in a specific order. However, this does have the effect of making authoring more difficult (on the other hand, since RSS files are usually not authored manually, this may be acceptable). Another alternative might be to place all the <item> elements under a wrapper element, perhaps called <items>.

<xs:all>

  <xs:element name=”required1”/>

  <xs:element name=”required2”/>

  <xs:element name=”optional1” minOccurs=”0”/>

  <xs:element name=”optional2” minOccurs=”0”/>

  <xs:element name=”items”>

    <xs:complexType>

      <xs:sequence>

        <xs:element name=”item” minOccurs=”0” maxOccurs=”unbounded”/>

      </xs:sequence>

    </xs:complexType>

  </xs:element>

</xs:all>

Unfortunately, when dealing with an existing specification like RSS, we usually don’t have the luxury of making breaking changes, so we’ll have to find a way to work around these issues problems. Arguably the best solution is to underspecify the content model in the schema by allowing more liberal occurrence constraints, and then to perform additional checks at the application level. For example:

<xs:choice maxOccurs=”unbounded”>

  <xs:element name=”required”/>

  <xs:element name=”optional”/>

  <xs:element name=”item”/>

</xs:choice>

The content model above will allow one or more of any of the above elements in any order. You would then have to write additional application code to verify that exactly one of each required element and no more than one of each optional element appears.

 

But if for some reason you absolutely must validate RSS or some schema with a similar issue using only XML Schema, there’s a way to do it. The basic idea is to put everything in an unbounded <xs:choice> (as in the example above), and then use identity constraints (<xs:unique> and <xs:key>) to enforce cardinality limits. Use <xs:key> for required elements and <xs:unique> for optional elements.

 

Identity constraints won’t work without a field, so you also need to add a fixed attribute to each element whose cardinality you want to constrain. This attribute need not be present in the actual instance documents, since the schema processor will assume the existence of fixed attributes if they’re not explicitly specified.

 

There are some drawbacks to this approach:

  1. SQL Server doesn’t support identity constraints in XML Schema Collections, so you’d have to validate it using UDFs or with client-side code (System.Xml does support them).
  2. The error messages may be a bit confusing, since you’re using identity constraints in a non-standard way.
  3. You can get 0-1, 1-1, and 0-unbounded cardinalities this way. I’m not aware of a way to get 1-unbounded.
  4. The post-schema validation infoset will contain the fixed attribute.

The XSD below illustrates this technique:

<xs:schema xmlns:x="https://ns" targetNamespace="https://ns"xmlns:xs="https://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">

 

<xs:element name="root">

  <xs:complexType>

    <xs:choice maxOccurs="unbounded">

 

      <xs:element name="required">

        <xs:complexType>

          <xs:simpleContent>

            <xs:extension base="xs:string">

              <xs:attribute name="required" fixed="required"/>

            </xs:extension>

          </xs:simpleContent>

        </xs:complexType>

      </xs:element>

 

      <xs:element name="optional">

        <xs:complexType>

          <xs:simpleContent>

            <xs:extension base="xs:string">

              <xs:attribute name="optional" fixed="optional"/>

            </xs:extension>

          </xs:simpleContent>

        </xs:complexType>

      </xs:element>

 

      <xs:element name="many"/>

     

    </xs:choice>

  </xs:complexType>

 

  <xs:key name="reqkey">

    <xs:selector xpath="."/>

    <xs:field xpath="x:required/@required"/>

  </xs:key>

 

  <xs:unique name="optunique">

    <xs:selector xpath="."/>

    <xs:field xpath="x:optional/@optional"/>

  </xs:unique>

 

</xs:element>

 

</xs:schema>

I mention this technique more as a curiosity than as a way of recommending it. In most cases, I think the solutions mentioned above (better schema design or additional checks in the application code) would be preferable. Nevertheless, the possibility is there for those who feel a need to take advantage of it.