The Holy Grail in XML<->Object Mapping Technologies


I was reading a post by Rory Blyth where he points to Steve Maine's explanation of the benefits of Prothon (an object oriented programming language without classes). He writes



One quote from Steve's post that has me thinking a bit, though, is the following:



The inherent extensibility and open content model of XML makes coming up with a statically typed representation that fully expresses all possible instance documents impossible. Thus, it would be cool if the object representation could expand itself to add new properties as it parsed the incoming stream.


I can see how this would be cool in a "Hey, that's cool" sense, but I don't see how it would help me at work. I fully admit that I might just be stupid, but I'm honestly having a hard time seeing the benefit. Right now, I'm grabbing XML in the traditional fashion of providing the name of the node that I want as a string key, and it seems to be working just fine.


The problem solved by being able to dynamically add properties to a class in the case of XML<->object mapping technologies is that it allows developers to program against aspects of the XML document in a strongly typed manner even if they are not explicitly described in the schema for the XML document.


This may seem unobvious so I'll provide an example that illustrates the point. David Orchard of BEA wrote a schema for the ATOM 0.3 syndication format. Below is the fragment of the schema that describes ATOM entries



 <xs:complexType name="entryType">
  <xs:sequence>
   <xs:element name="title" type="xs:string"/>
   <xs:element name="link" type="atom:linkType"/>
   <xs:element name="author" type="atom:personType" minOccurs="0"/>
   <xs:element name="contributor" type="atom:personType" minOccurs="0" maxOccurs="unbounded"/>
   <xs:element name="id" type="xs:string"/>
   <xs:element name="issued" type="atom:iso8601dateTime"/>
   <xs:element name="modified" type="atom:iso8601dateTime"/>
   <xs:element name="created" type="atom:iso8601dateTime" minOccurs="0"/>
   <xs:element name="summary" type="atom:contentType" minOccurs="0"/>
   <xs:element name="content" type="atom:contentType" minOccurs="0" maxOccurs="unbounded"/>
   <xs:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>
  </xs:sequence>
  <xs:attribute ref="xml:lang" use="optional"/>
  <xs:anyAttribute/>
 </xs:complexType> 


The above schema fragment produces the following C# class when the .NET Framework's XSD.exe tool is run with the ATOM 0.3 schema as input.



/// <remarks/>
[XmlTypeAttribute(Namespace="http://purl.org/atom/ns#")]
public class entryType {
   
    /// <remarks/>
    public string title;
   
    /// <remarks/>
    public linkType link;
   
    /// <remarks/>
    public personType author;
   
    /// <remarks/>
    [XmlElementAttribute("contributor")]
    public personType[] contributor;
   
    /// <remarks/>
    public string id;
   
    /// <remarks/>
    public string issued;
   
    /// <remarks/>
    public string modified;
   
    /// <remarks/>
    public string created;
   
    /// <remarks/>
    public contentType summary;
   
    /// <remarks/>
    [XmlElementAttribute("content")]
    public contentType[] content;
   
    /// <remarks/>
    [XmlAnyElementAttribute()]
    public System.Xml.XmlElement[] Any;
   
    /// <remarks/>
    [XmlAttributeAttribute
    (Namespace=
    "http://www.w3.org/XML/1998/namespace")]
    public string lang;
   
    /// <remarks/>
    [XmlAnyAttributeAttribute()]
    public System.Xml.XmlAttribute[] AnyAttr;

}


As a side note I should point out that David Orchard's ATOM 0.3 schema is invalid since it refers to an undefined authorType so I had to remove the reference from the schema to get it to validate.


The generated fields highlighted in bold show the problem that the ability to dynamically add fields to a class would solve. If programming against an ATOM feed using the above entryType class then once one saw an extension element, you'd have to fallback to XML processing instead of programming using strongly typed constructs.  For example, consider Mark Pilgrim's RSS feed which has dc:subject elements which are not described in the ATOM 0.3 schema but are allowed due to the existence of xs:any wildcards. Watch how this complicates the following code which prints the title, issued date and subject of each entry.



foreach(entryType entry in feed.Entries){


  Console.WriteLine("Title: " + entry.title);
  Console.WriteLine("Issued: " + entry.issued);


  string subject = null;


 //find the dc:subject
  foreach(XmlElement elem in entry.Any){
   if(elem.LocalName.Equals("subject") &&
      elem.NamespaceUri.Equals("http://purl.org/dc/elements/1.1/"){
     subject = elem.InnerText;
     break;
   }
  }


  Console.WriteLine("Subject: " + subject); 
 
 }


As you can see, one minute you are programming against statically and strongly typed C# constructs and the next you are back to checking the names of XML elements and programming against the DOM. If there was infrastructure that enabled one to dynamically add properties to classes then it is conceivable that even though the ATOM 0.3 schema doesn't define the dc:subject element one would still be able program against them in a strongly typed manner in generated classes. So one could write code like




foreach(entryType entry in feed.Entries){


  Console.WriteLine("Title: " + entry.title);
  Console.WriteLine("Issued: " + entry.issued); );
  Console.WriteLine("Subject: " + entry.subject);  
 }


Of course, there are still impedance mismatches to resolve like how to reflect namespace names of elements or make the distinction between attributes vs. elements in the model but having the capabilities Steve Maine describes in his original post would improve the capabilities of the XML<->Object mapping technologies that exist today.


Comments (3)
  1. Matt Warren says:

    As soon as you add the ability to ‘add’ properties to types at runtime, you lose the ability to interact with these properties as strongly typed/strongly bound things. This cannot be solved by the invention of a language. It is a fundemental schism between the two representations. Still, if the runtime allowed extraneous data to be attached to an object instance you might be able to fall back to a API model of interacting with the object that would include this information. If nothing else, your strongly typed objects could at least hold on to this data so that it may be persisted out, back into the void from which it came!

  2. Matt,

    It seems you are assuming that this should be built against a statically typed language. One could build such a system using a dynamic yet strongly typed language. Doug Purdy prototyped some ideas in this regard at http://www.douglasp.com/2003/05/13.html#a288

  3. Erik Johnson says:

    Rather than build the type from an instance document, a schema could be declared. Perhaps the inner store could be an XPathDocument. Data could be serialized as XML pretty fast using streams (later on as a binary infoset). Schema validation could also be an option. With apologies to the generics syntax, here’s a snippet. BTW, the schema types also would flow to the WSDL produced by the SOAP processor.

    [WebMethod]

    void UpdateCustomer(SchemaBasedType<customer.xsd> doc)

    {

    string cName = doc.CustomerName;

    SchemaBasedType<contact.xsd> contact = doc.Contacts.Create("NewID");

    contact.Name = "Frank";

    contact.Phone = "934-393-8756";

    // Now use some cool data access service

    // (maybe even based on the ObjectSpaces map

    // files w/o the slow O/R overhead) to

    // save to a DB

    DataAccessSvc.ApplyChanges(doc);

    }

Comments are closed.

Skip to main content