UPA in plain English (by Priya Lakshminarayanan)

Unique Particle Attribution or UPA as it is known among schema savvy XML geeks, is one among many rules in the XML Schema spec that confounds most schema authors.   If you ever see the following error message while compiling a schema, then you have hit a UPA problem. 

Multiple definition of element 'foo' causes the content model to become ambiguous. A content model must be formed such that during validation of an element information item sequence, the particle contained directly, indirectly or implicitly therein with which to attempt to validate each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence”.

 

This blog entry is an attempt to explain how this rule is applied and present some examples of content models that violate UPA with the hope that if you ever hit this error, you will be able to understand it and resolve the issue.

 

Consider the following schema and the corresponding XML snippet:

<xs:schema xmlns:tns="https://tempuri.org"

           targetNamespace="https://tempuri.org"

           xmlns:xs=https://www.w3.org/2001/XMLSchema

        elementFormDefault="qualified">

  <xs:element name="Customer">

    <xs:complexType>

      <xs:sequence>

        <xs:element name="FirstName" type="xs:string" />

        <xs:element name="LastName" type="tns:LastNameType"/>

        <xs:any namespace="##any" processContents="lax" minOccurs="0"/>

      </xs:sequence>

      <xs:attribute name="CustID" type="xs:positiveInteger"

                    use="required" />

    </xs:complexType>

  </xs:element>

  <xs:simpleType name="LastNameType">

    <xs:restriction base="xs:string">

      <xs:maxLength value="20"/>

    </xs:restriction>

  </xs:simpleType>

</xs:schema>

The schema author has decided to add an xs:any wildcard at the end of the content model so as to facilitate extensibility of his/her schema.

<Customer xmlns="https://tempuri.org" CustID="1">

      <FirstName>Harry</FirstName>

      <LastName>Potter</LastName>

      <CustomerInfo>xyz</CustomerInfo>

</tns:Customer>

 

The following table shows the corresponding particles in the schema that map to the element names during the process of validation.

Xml Element

Schema Particle (or declaration)

{https://tempuri.org}Customer

<xs:element name="Customer">

</xs:element>

{https://tempuri.org}FirstName

<xs:element name="FirstName" type="xs:string" />

{https://tempuri.org}LastName

<xs:element name="LastName" type="tns:LastNameType"/>

{https://tempuri.org}CustomerInfo

<xs:any namespace="##any" processContents="lax" minOccurs="0"/>

 

As you can see from the table, all element names in the instance uniquely map to a corresponding particle in the schema. In other words, the validator can attribute a unique particle to each element without any ambiguity.

Now consider a slightly modified version of the same schema above where the schema author decides that the LastName element can be optional as well:

<xs:element name="LastName" type="tns:LastNameType" minOccurs="0" />

 

Reconstructing our particle attribution table for the same XML instance above,

 

Xml Element

Schema Particle (or declaration)

{https://tempuri.org}Customer

<xs:element name="Customer">

</xs:element>

{https://tempuri.org}FirstName

<xs:element name="FirstName" type="xs:string" />

{https://tempuri.org}LastName

<xs:element name="LastName" type="tns:LastNameType"/>

OR

<xs:any namespace="##any" processContents="lax" minOccurs="0"/>

{https://tempuri.org}CustomerInfo

<xs:any namespace="##any" processContents="lax" minOccurs="0"/>

 

We can see from the table that the "LastName" element can potentially match the xs:any wildcard in the schema as well. Making the LastName element optional has the side effect of allowing the LastName element in the instance to match against either the element declaration or the wildcard.

 

Can I Look Ahead?

Of course, if our validator were to look ahead in the stream to see that there is a <CustomerInfo> element that will match against the xs:any, then it can make the decision to match the <LastName> element to the element declaration and all elements in the instance will be attributed to unique particles in the schema and things will be golden. Unfortunately, this does not solve our problem.  To understand why, let’s look at the definition of the UPA rule in the XML Schema specification:

 

A content model must be formed such that during · validation · of an element information item sequence, the particle component contained directly, indirectly or · implicitly · therein with which to attempt to · validate · each item in the sequence in turn can be uniquely determined without examining the content or attributes of that item, and without any information about the items in the remainder of the sequence.

 

We can see that our second schema violates the UPA rule since we cannot determine a unique particle for the <LastName> element without looking at the remaining elements which follow <LastName>.

 

Content Models that violate UPA

A content model is said to violate the UPA constraint, if it has two particles that overlap, ie they can essentially validate the same element in the xml instance (the definition for overlap is provided at https://www.w3.org/TR/xmlschema-1/#non-ambig) and

· Both particles belong to a choice or all group

                                                OR

· Both particles validate adjacent elements and the first particle has its minOccurs less than the maxOccurs.

 

Following are some examples that illustrate the above conditions:

 

  1. They are both element declaration particles whose declarations have the same {name} and {target namespace}

      1.1

      <xs:complexType name="ViolatesUPAType">

         <xs:sequence>

               <xs:element name="a" type="xs:string"/>

               <xs:element name="b" type="xs:string" minOccurs="0"/>

               <xs:element name="b" type="xs:string" fixed="xyz"/>

        </xs:sequence>

   </xs:complexType>

   1.2

      <xs:complexType name="ViolatesUPAType">

  <xs:choice> 

<xs:element name="b" type="xs:string" />

               <xs:element name="b" type="xs:string" fixed="xyz"/>

        </xs:choice>

   </xs:complexType>

 

  1. They are both wildcards, and the intentional intersection of their {namespace constraint}s as defined in Attribute Wildcard Intersection (§3.10.6 in the spec) is not the empty set

      2.1

      <xs:complexType name="ViolatesUPAType">

         <xs:choice>

         <xs:any namespace="a b c" processContents="lax"/>

               <xs:any namespace="b e f" processContents="lax"/>

         </xs:choice>

   </xs:complexType>

   2.2

   <xs:complexType name="ViolatesUPAType">

         <xs:sequence>

               <xs:any namespace="##other" processContents="lax" minOccurs="2" maxOccurs="4"/>

               <xs:any namespace="##any" processContents="lax"/>

         </xs:sequence>

   </xs:complexType>

  1. One is a wildcard and the other is an element declaration whose targetNamespace is allowed by the namespace attribute of the wildcard.

   3.1

   <xs:complexType name="ViolatesUPAType">

         <xs:sequence>

        <xs:element name="b" type="xs:string" minOccurs="0" form="qualified"/>

         <xs:any namespace="##targetNamespace" processContents=”lax”/>

        </xs:sequence>

   </xs:complexType>

   3.2

   <xs:complexType name="ViolatesUPAType">

         <xs:sequence>

               <xs:element ref="importNS:importedElement"

   minOccurs="0" />

   <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>

        </xs:sequence>

   </xs:complexType>

SubstituionGroups & UPA

The presence of substitution group head elements in a content model might in some cases lead to violation of the UPA constraint. Some schema processors that resolve substitution group to a choice between the head element and the member elements will error for the following cases whereas other schema processors don’t.

 

  1. The particles that overlap are both element declaration particles one of whose {name} and {target namespace} is the same as that of an element declaration in the other's · substitution group

      <xs:complexType>

         <xs:sequence>

               <xs:element ref="head" minOccurs="0"/>

               <xs:element ref="member"/>

         </xs:choice>

   </xs:complexType>

    <xs:element name="head"/>

  <xs:element name="member" substitutionGroup="head"/>

  1. One is a wildcard and the other is an element that is the head of a substitution group and the {target namespace} of any member of its · substitution group · is · valid · with respect to the {namespace constraint} of the wildcard

     

      Imported schema:

   <xs:schema targetNamespace="https://import" xmlns:xs="https://www.w3.org/2001/XMLSchema" xmlns:tns="https://import">

   <xs:complexType name="MayViolateUPAType">

         <xs:sequence>

               <xs:element ref="tns:head" minOccurs="0"/>

               <xs:any namespace="##other" processContents="lax"/>

         </xs:sequence>

   </xs:complexType>

   <xs:element name="head"/>

</xs:schema>

   Main schema:

  

      <xs:schema targetNamespace="https://main" xmlns:imp="https://import">

         <xs:import namespace="https://import" schemaLocation="import.xsd"/>

             <xs:element name="member" substitutionGroup="imp:head"/>

   </xs:schema>

 

When compiling the main schema with the import, some schema processors might throw a UPA error since the namespace of element "member" is allowed by the wildcard and the reference to the "head" element expands to a choice of head and member.

 

In the .NETframework V2.0, the XmlSchemaSet class does not treat the above as content models that violate UPA though the obsoleted XmlSchemaCollection class does treat them as errors.