The Impedence Mismatch between W3C XML Schema and the CLR

Daniel Cazzulino is writing about W3C XML Schema type system < – > CLR type system and has an informal poll at the bottom of his article where he writes

We all agree that many concepts in WXS don’t map to anything existing in OO languages, such as derivation by restriction, content-ordering (i.e. sequence vs choice), etc. However, in the light of the tools the .NET Framework makes available to map XML to objects, we usually have to analyze WXS (used to define the structure of that very XML instance to be mapped) and its relation with our classes
In this light, I’m conducting a survey about developer’s view on the relation of the XSD type system and the .NET one. Ignoring some of the more advanced (I could add cumbersome and confusing) features of WXS, would you say that both type systems fit nicely with each other?

I find the question at the end of his post which I highlighted to be highly tautological. His question is basically, “If you ignore the parts where they don’t fit well together do the CLR and XSD type  system fit well together?”. Well if you ignore the parts where they don’t then the only answer is YES. In reality many developers don’t have the freedom to ignore parts of XSD they don’t want to support especially when utilizing XML Web Services designed by others.

There are two primary ways one can utilize the XmlSerializer which maps between XSD and CLR types

  1. XML Serialization of Object State: In this case the developer is only interested in ensuring that the state of his classes can be converted to XML. This is a fairly simple problem because the expressiveness of the CLR is a subset of that of W3C XML Schema. Any object’s state could be mapped to an element of complex type containing a sequence or choice of other nested elements that are either nested simple types or complex types.

    Even then there are limitations in the XmlSerializer which make this cumbersome such as the fact that it only serializes public fields but not public properties. But that is just a design decision that can be revisited in future releases.

  2. Conversion of XML to Objects: This is the scenario where a developer converts an XML schema to CLR objects to make them easier to program against. This is particularly common in XML Web Services scenarios which is why the XmlSerializer was originally designed. In this scenario the conversion tool has to contend with the breadth of features in the XML Schema: Structures and XML Schema: Datatypes recommendations.

    There are enough discrepancies between the W3C XML Schema type system and that of the CLR to fill a Ph.D thesis. I touched on some of these in my article XML Serialization in the .NET Framework such as

    Q: What aspects of W3C XML Schema are not supported by the XmlSerializer during conversion of schemas to classes?

    A: The XmlSerializer does not support the following:

    • Any of the simple type restriction facets besides enumeration.
    • Namespace based wildcards.
    • Identity constraints.
    • Substitution groups.
    • Blocked elements or types.

    After gaining more experience with working with the XmlSerializer and talking to a number of customers I wrote som more about the impedance mismatches in my article XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? specifically

    For usage scenarios where a schema is used to create strongly typed XML, derivation by restriction is problematic. The ability to restrict optional elements and attributes does not exist in the relational model or in traditional concepts of type derivation from OOP languages. The example from the previous section where the email element is optional in the base type, but cannot appear in the derived type, is incompatible with the notion of derivation in an object oriented sense, while also being similarly hard to model using tables in a relational database.

    Similarly changing the nillability of a type through derivation is not a capability that maps to relation or OOP models. On the other hand, the example that doesn’t use derivation by restriction can more straightforwardly be modeled as classes in an OOP language or as relational tables. This is important given that it reduces the impedance mismatch which occurs when attempting to map the contents of an XML document into a relational database or convert an XML document into an instance of an OOP class

I’m not the only one at Microsoft who’s written about this impedance mismatch or tried to solve it. Gavin Bierman, Wolfram Schulte and Erik Meijer wrote in their paper Programming with Circles, Triangles and Rectangles an entire section about this mismatch. Below are links to descriptions of a couple of the mismatches they found most interesting

The mismatch between XML and object data-models
     Edge-labelled vs. Node-labelled
     Attributes versus elements
     Elements versus complex and simple types
     Multiple occurrences of the same child element
     Anonymous types
     Substitution groups vs derivation and the closed world assumption
     Namespaces, namespaces as values
     Occurence constraints part of container instead of type
     Mixed content

There is a lot of discussion one could have about the impedance mismatch between the CLR type system and the XSD type system but one thing you can’t say is that it doesn’t exist or that it can be ignored if building schema-centric applications.

    In conclusion, the brief summary is that if one is mapping objects to XML for the purpose of serializing their state then there is a good match between the CLR & XSD type systems since the XSD type system is more expressive than the CLR type system. On the other hand, if one is trying to go from XSD to the CLR type system there are significant impedance mismatches some of which are limitations of the current tools (e.g. XmlSerializer could code gen range checks for derivation by restriction of simple types or uniqueness tests for identity constraints ) while others are fundamental differences between the XSD type system and object oriented programming such as the difference between derivation by restriction in XSD and type derivation.