Mixed content (part 1 of many)

I started working on this two days ago. I quickly realized that when I’m done I’ll most likely have material for multiple posts. My original plan was to finish the article in the next few days, and then publish it as several posts over time. However I eventually decided to start posting now, while the work is still in progress, and let any feedback or questions you might have guide me.

Description

Mixed content is one of the great but often overlooked features of xml schema. A mixed complex type makes it legal for text nodes to appear between the elements that are normally expected by the validator.

In order for mixed content to be allowed the <xsd:complexType> node of the type's declaration must contain an attribute named "mixed" with a value of "true".

For a more complete explanation you can start with the Section of the W3C's xsd primer that deals with mixed content at https://www.w3.org/TR/xmlschema-0/#mixedContent.

In this post we will examine SQL Server 2005's capabilities and behavior when dealing with mixed content.

Example

As an example, we’ll deal with letters. Letters are ideal candidates for mixed content because they can contain both structured and unstructured information. The sender's and recipient(s) addresses for example constitute structured data for which we can design content models. The body of a letter itself is free-flow text but within it we can identify certain valuable pieces of information such as dates or email addresses using XML elements.

Here is a sample schema for letters.

CREATE XML SCHEMA COLLECTION SC_Letter AS '

<xsd:schema xmlns:xsd="https://www.w3.org/2001/XMLSchema"

targetNamespace="urn:letter"

elementFormDefault="qualified"

xmlns="urn:letter">

      <xsd:element name="letter">

            <xsd:complexType mixed="true">

                  <xsd:sequence>

                        <xsd:element name="date" type="xsd:date"/>

                        <xsd:element name="sender" type="addressType"/>

                        <xsd:sequence minOccurs="1" maxOccurs="unbounded">

                              <xsd:element name="recipient" type="addressType"/>

                        </xsd:sequence>

                        <xsd:any processContents="strict" namespace="urn:letter:tags" minOccurs="0" maxOccurs="unbounded"/>

                  </xsd:sequence>

            </xsd:complexType>

      </xsd:element>

      <xsd:complexType name="addressType">

            <xsd:sequence>

                        <xsd:element name="Title" type="xsd:string" minOccurs="0"/>

                        <xsd:element name="FirstName" type="xsd:string" minOccurs="0"/>

                        <xsd:element name="LastName" type="xsd:string"/>

                        <xsd:element name="Line1" type="xsd:string"/>

                        <xsd:element name="Line2" type="xsd:string" minOccurs="0"/>

                        <xsd:element name="City" type="xsd:string"/>

                        <xsd:element name="ZIPCode" type="xsd:string" minOccurs="0"/>

                        <xsd:choice>

                              <xsd:element name="State" type="xsd:string"/>

                              <xsd:element name="Country" type="xsd:string"/>

                        </xsd:choice>

            </xsd:sequence>

      </xsd:complexType>

</xsd:schema>

<xsd:schema xmlns:xsd="https://www.w3.org/2001/XMLSchema"

targetNamespace="urn:letter:tags"

elementFormDefault="qualified" xmlns="urn:letter:tags">

      <xsd:element name="person" type="xsd:string"/>

      <xsd:element name="date" type="xsd:date"/>

      <xsd:element name="title" type="xsd:string"/>

      <xsd:element name="URI" type="xsd:anyURI"/>

      <xsd:element name="email" type="emailType"/>

      <xsd:simpleType name="emailType">

            <xsd:restriction base="xsd:string">

                  <xsd:pattern value="[\w\-\.]+@([\w\-]+\.)+[\w\-]{2,4}"/>

            </xsd:restriction>

      </xsd:simpleType>

           

</xsd:schema>'

go

Now let’s create a table with a typed XML column

CREATE TABLE T (iCol int primary key, xmlCol XML(SC_Letter))

go

And finally, let's insert a sample XML instance

INSERT INTO T VALUES (1, '

<l:letter xmlns:l="urn:letter" xmlns:t="urn:letter:tags">

      <l:date>2006-10-15-08:00</l:date>

      <l:sender>

            <l:FirstName>James</l:FirstName>

            <l:LastName>Kirk</l:LastName>

            <l:Line1>

1234 Elm Street

</l:Line1>

            <l:City>Redmond</l:City>

            <l:ZIPCode>98052</l:ZIPCode>

            <l:State>WA</l:State>

      </l:sender>

      <l:recipient>

            <l:FirstName>Jean-Luc</l:FirstName>

            <l:LastName>Picard</l:LastName>

            <l:Line1>13 rue des Lilas</l:Line1>

            <l:City>Ronchin</l:City>

            <l:ZIPCode>59555</l:ZIPCode>

            <l:Country>France</l:Country>

      </l:recipient>

Dear Sir,

      I received your letter dated <t:date>2006-10-10+00:00</t:date> this morning. It would be my pleasure to attend the annual symposium <t:title>XML database systems in class-B Federation Spacecrafts</t:title>.

Please contact my assistant <t:person>Mr. Spock</t:person> at <t:email>spock@space-federation.org</t:email> to make all necessary arrangements.

Sincerely,

James T. Kirk

</l:letter>')

go

As you can see the 'letter" element contains interleaved element and text children. This is legal because the type was declared with mixed=''true".

It is important to understand that this property is not recursive. In our example it would be illegal to add text nodes under the "recipient" element since its type is not mixed. You can for example try the following INSERT query.

INSERT INTO T VALUES (0, '

<l:letter xmlns:l="urn:letter" xmlns:t="urn:letter:tags">

      <l:date>2006-10-15-08:00</l:date>

      <l:sender>

            <l:FirstName>James</l:FirstName>

This is a text node

            <l:LastName>Kirk</l:LastName>

            <l:Line1>

1234 Elm Street

</l:Line1>

            <l:City>Redmond</l:City>

            <l:ZIPCode>98052</l:ZIPCode>

            <l:State>WA</l:State>

      </l:sender>

</l:letter>')

go

There is a text node under the “sender” element. Since “sender” is of type “addressType” and that type wasn’t defined as mixed, its presence violates the schema and the query predictably fails with the following error message:

XML Validation: Text node is not allowed at this location, the type was defined with element only content or with simple content. Location: /*:letter[1]/*:sender[1]

 

Next time we’ll look at text nodes, how the validator handles them, and how we can query them…

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm.