The Holy Grail in XML<->Object Mapping Technologies

Article
03/28/2004

I was reading a post by Rory Blyth where he points to Steve Maine's explanation of the benefits of Prothon (an object oriented programming language without classes). He writes

One quote from Steve's post that has me thinking a bit, though, is the following:

The inherent extensibility and open content model of XML makes coming up with a statically typed representation that fully expresses all possible instance documents impossible. Thus, it would be cool if the object representation could expand itself to add new properties as it parsed the incoming stream.

I can see how this would be cool in a "Hey, that's cool" sense, but I don't see how it would help me at work. I fully admit that I might just be stupid, but I'm honestly having a hard time seeing the benefit. Right now, I'm grabbing XML in the traditional fashion of providing the name of the node that I want as a string key, and it seems to be working just fine.

The problem solved by being able to dynamically add properties to a class in the case of XML<->object mapping technologies is that it allows developers to program against aspects of the XML document in a strongly typed manner even if they are not explicitly described in the schema for the XML document.

This may seem unobvious so I'll provide an example that illustrates the point. David Orchard of BEA wrote a schema for the ATOM 0.3 syndication format. Below is the fragment of the schema that describes ATOM entries

<xs:complexType name="entryType">
  <xs:sequence>
   <xs:element name="title" type="xs:string"/>
   <xs:element name="link" type="atom:linkType"/>
   <xs:element name="author" type="atom:personType" minOccurs="0"/>
   <xs:element name="contributor" type="atom:personType" minOccurs="0" maxOccurs="unbounded"/>
   <xs:element name="id" type="xs:string"/>
   <xs:element name="issued" type="atom:iso8601dateTime"/>
   <xs:element name="modified" type="atom:iso8601dateTime"/>
   <xs:element name="created" type="atom:iso8601dateTime" minOccurs="0"/>
   <xs:element name="summary" type="atom:contentType" minOccurs="0"/>
   <xs:element name="content" type="atom:contentType" minOccurs="0" maxOccurs="unbounded"/>
   <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute ref="xml:lang" use="optional"/>
  <xs:anyAttribute/>
</xs:complexType>

The above schema fragment produces the following C# class when the .NET Framework's XSD.exe tool is run with the ATOM 0.3 schema as input.

/// <remarks/>
[XmlTypeAttribute(Namespace="https://purl.org/atom/ns#")]
public class entryType {

    /// <remarks/>
    public string title;

    /// <remarks/>
    public linkType link;

    /// <remarks/>
    public personType author;

    /// <remarks/>
    [XmlElementAttribute("contributor")]
    public personType[] contributor;

    /// <remarks/>
    public string id;

    /// <remarks/>
    public string issued;

    /// <remarks/>
    public string modified;

    /// <remarks/>
    public string created;

    /// <remarks/>
    public contentType summary;

    /// <remarks/>
    [XmlElementAttribute("content")]
    public contentType[] content;

    /// <remarks/>
[XmlAnyElementAttribute()]
public System.Xml.XmlElement[] Any;

    /// <remarks/>
    [XmlAttributeAttribute
    (Namespace=
    "https://www.w3.org/XML/1998/namespace")]
    public string lang;

    /// <remarks/>
[XmlAnyAttributeAttribute()]
public System.Xml.XmlAttribute[] AnyAttr;
}

As a side note I should point out that David Orchard's ATOM 0.3 schema is invalid since it refers to an undefined authorType so I had to remove the reference from the schema to get it to validate.

The generated fields highlighted in bold show the problem that the ability to dynamically add fields to a class would solve. If programming against an ATOM feed using the above entryType class then once one saw an extension element, you'd have to fallback to XML processing instead of programming using strongly typed constructs. For example, consider Mark Pilgrim's RSS feed which has dc:subject elements which are not described in the ATOM 0.3 schema but are allowed due to the existence of xs:any wildcards. Watch how this complicates the following code which prints the title, issued date and subject of each entry.

foreach(entryType entry in feed.Entries){

Console.WriteLine("Title: " + entry.title);
Console.WriteLine("Issued: " + entry.issued);

string subject = null;

//find the dc:subject
foreach(XmlElement elem in entry.Any){
   if(elem.LocalName.Equals("subject") &&
      elem.NamespaceUri.Equals("https://purl.org/dc/elements/1.1/"){
     subject = elem.InnerText;
     break;
   }
}

Console.WriteLine("Subject: " + subject);

}

As you can see, one minute you are programming against statically and strongly typed C# constructs and the next you are back to checking the names of XML elements and programming against the DOM. If there was infrastructure that enabled one to dynamically add properties to classes then it is conceivable that even though the ATOM 0.3 schema doesn't define the dc:subject element one would still be able program against them in a strongly typed manner in generated classes. So one could write code like

foreach(entryType entry in feed.Entries){

Console.WriteLine("Title: " + entry.title);
Console.WriteLine("Issued: " + entry.issued); );
Console.WriteLine("Subject: " + entry.subject);
}

Of course, there are still impedance mismatches to resolve like how to reflect namespace names of elements or make the distinction between attributes vs. elements in the model but having the capabilities Steve Maine describes in his original post would improve the capabilities of the XML<->Object mapping technologies that exist today.

The Holy Grail in XML<->Object Mapping Technologies

Additional resources