Prettification of XML Serialization within Web Services


Prettification of XML Serialization within Web Services


This came up yesterday on an internal mailing list.


A colleague remarked that, following best practices in modular schema design results in bloat of serialized messages, particularly with namespace attributes. This issue manifests itself when using .NET Web Services, either client or server, either .NET 1.x or 2.0.


Was this person right? Yes and No. This bloat can happen, but it can be avoided pretty simply in .NET if you know the knob to twist. Before I show you the knob I want to explain the problem a bit more.


The Schema and the Code


Consider the case where there is a common schema that is used across enterprise. As a simple example, an address schema:



<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”


                  targetNamespace=”urn:my-enterprise:Basics”


                  elementFormDefault=”qualified” >


 


      <xs:complexType name=”Address”>


            <xs:sequence>


                  <xs:element name=”Town” type=”xs:string” />


                  <xs:element name=”Street” type=”xs:string” />


                  <xs:element name=”Number” type=”xs:int” />


            </xs:sequence>


      </xs:complexType>


</xs:schema>


If you are following a schema-first design, then you generate your data types from this XSD. Using the Xsd.exe tool in .NET, you would get this sort of class declaration:



  [System.Xml.Serialization.XmlTypeAttribute(Namespace=”urn:my-enterprise:Basics”)]


  public class Address


  {


    public string Town;


    public string Street;


    public int Number;


  }


Each business unit in the enterprise defines services and messages in its own namespace, deriving from the general definitions from the canonical schema namespace. For example, an AccountHolder element:



<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”


                targetNamespace=”urn:my-unit:AccountTypes”


                  xmlns:s1=”urn:my-enterprise:Basics”


                  elementFormDefault=”qualified”>


 


      <!— the location is a hint only —>


      <xs:import namespace=”urn:my-enterprise:Basics” 


    schemaLocation=”Basics.xsd”/>


 


      <xs:element name=”AccountHolder”>


            <xs:complexType>


                  <xs:sequence>


                        <xs:element name=”Name” type=”xs:string” />


                        <xs:element name=”DateOfBirth” type=”xs:date” />


                        <xs:element name=”HomeAddress” type=”s1:Address” />


                  </xs:sequence>


            </xs:complexType>


      </xs:element>


</xs:schema>


If you generated code for this type, you’d get something like this:



  [System.Xml.Serialization.XmlTypeAttribute(Namespace=”urn:my-unit:AccountTypes”)]


  [System.Xml.Serialization.XmlRootAttribute(Namespace=”urn:my-unit:AccountTypes”, IsNullable=false)]


  public class AccountHolder


  {


    public string Name;


    [System.Xml.Serialization.XmlElementAttribute(DataType=”date”)]


    public System.DateTime DateOfBirth;


    public Address HomeAddress;


  }


We know about the impedance mismatch problem between Schema and Code, but in this example, the schema is pretty simple and so the problem does not rear its ugly head. We can use the generated classes to serialize and de-serialize instances to and from XML. Cool.

Prettification during Explicit Xml Serialization


Let’s say we use the AccountHolder element in an scenario where we just want to manually serialize it into XML. The XML Serialization capability that is built-in to .NET makes this easy. The output XML looks like this:



<AccountHolder xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns=”urn:my-unit:AccountTypes”>


  <Name>Smythe</Name>


  <DateOfBirth>2006-01-05</DateOfBirth>


  <HomeAddress>


    <Town xmlns=”urn:my-enterprise:Basics”>Willingham</Town>


    <Street xmlns=”urn:my-enterprise:Basics”>Ryland</Street>


    <Number xmlns=”urn:my-enterprise:Basics”>123</Number>


  </HomeAddress>


</AccountHolder>


The XML is well-formed and valid, and can be “de-serialized” by some other application, somewhere else, running on some other platform. Maybe a Java app using XmlBeans or JAXB, etc. The code fragment that would produce this XML is here:



  XmlSerializer s1= new XmlSerializer(holder.GetType());


  s1.Serialize(System.Console.Out, holder);


  System.Console.WriteLine(“\n”);


But, you’ll notice the output XML is not the prettiest or leanest it could be. THE DREADED BLOAT. You can see the repetition of xmlns in the serialized form. The AccountHolder element and its sub-elements (including HomeAddress) use an xml namespace of “urn:my-unit:AccountTypes”, but the child elements of HomeAddress are defined in the namespace of “urn:my-enterprise:Basics”. The result is, the explicit declaration of the XML namespace for each of those elements. You can also see the inclusion of xsi and xsd prefixes, which are not used in the document.


If we are flaming XML aesthetes, we can tweak the format using the .NET serialization classes to prettify it. In particular, if we pass a namespace collection to the serializer using this overload, we can optimize the XML to look like this:



<AccountHolder xmlns:b=”urn:my-enterprise:Basics” xmlns=”urn:my-unit:AccountTypes”>


  <Name>Smythe</Name>


  <DateOfBirth>2006-01-05</DateOfBirth>


  <HomeAddress>


    <b:Town>Willingham</b:Town>


    <b:Street>Ryland</b:Street>


    <b:Number>123</b:Number>


  </HomeAddress>


</AccountHolder>


Exactly equivalent, but leaner (35% fewer bytes), and easier on the eyes. It can still be validated against the original XSD, and it can still be “de-serialized” by an app running on a non-.NET platform. Of course, it goes without saying a .NET app could also de-serialize that XML. The code that produces this leaner XML is here:



  XmlSerializerNamespaces xmlns = new XmlSerializerNamespaces();


  xmlns.Add(“b”,”urn:my-enterprise:Basics”);


 


  XmlSerializer s1= new XmlSerializer(holder.GetType());


  s1.Serialize(System.Console.Out, holder, xmlns);


  System.Console.WriteLine(“\n”);


Prettification within Web Services?


Now, let’s consider the scenario where we use these types within a Web Service. Maybe we are writing a ASP.NET (ASMX) Service using those types, or maybe we’re writing a client that access an external service that uses those types. We are doing WSDL First because we are smart and like interop, so it really does not matter what platform the external service is running on, or what language it is implemented in. It’s all XML.


In this case, the XML serialization is done implicitly by .NET. This is good, because it saves labor. This affects our prettification approach though, because we are no longer instantiating an XmlSerializer and explicitly calling a Serialize() method. Instead we are calling a proxy method. At the application layer, there is no XmlSerializer object exposed, and therefore the app cannot interact with it and specify which XmlSerializerNamespaces to pass in.


Which means, we are now back to the big ugly XML, instead of the lean, pretty XML. It looks like this:



<soap:Envelope xmlns:soap=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>


  <soap:Body>


    <getAccountHolderResponse xmlns=”urn:my-unit:AccountsServiceMessages”>


      <AccountHolder xmlns=”urn:my-unit:AccountTypes”>


        <Name>Smythe</Name>


        <DateOfBirth>2006-01-05</DateOfBirth>


        <HomeAddress>


          <Town xmlns=”urn:my-enterprise:Basics”>Willingham</Town>


          <Street xmlns=”urn:my-enterprise:Basics”>Ryland</Street>


          <Number xmlns=”urn:my-enterprise:Basics”>123</Number>


        </HomeAddress>


      </AccountHolder>


    </getAccountHolderResponse>


  </soap:Body>


</soap:Envelope>


(Don’t get me started on the cultural bias that says lean=pretty for people. This is a technical blog, so we’ll save that for a different venue.) By the way, this goes beyond mere aesthetics. Leanness and readability can be an issue if your network is overburdened or if you are logging and auditing messages, or even tracing messages. Most real-world XML schema are going to have much more data, more nesting, and longer namespace names. All of which means, you could be spending a large portion, maybe 30% or more, of your messages just with xmlns=”…”.


But wait! There’s a way. By modifying the type definitions in code, we can tickle the XmlSerializerNamespaces that is used for a given type. Simply modify the generated type so it looks like this:



  [System.Xml.Serialization.XmlTypeAttribute(Namespace=”urn:my-enterprise:Basics”)]


  public class Address


  {


 


    [XmlNamespaceDeclarations]


    public XmlSerializerNamespaces namespaces;


 


    public Address()


    {


      namespaces= new XmlSerializerNamespaces();


      namespaces.Add(“b”, “urn:my-enterprise:Basics”);


    }


 


    public string Town;


    public string Street;


    public int Number;


  }


Then, use the type as normal. The on-the-wire XML you will get will look like this:



<soap:Envelope xmlns:soap=”http://schemas.xmlsoap.org/soap/envelope/” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xsd=”http://www.w3.org/2001/XMLSchema”>


  <soap:Body>


    <getAccountHolderResponse xmlns=”urn:my-unit:AccountsServiceMessages”>


      <AccountHolder xmlns=”urn:my-unit:AccountTypes”>


        <Name>Smythe</Name>


        <DateOfBirth>2006-01-05</DateOfBirth>


        <HomeAddress xmlns:b=”urn:my-enterprise:Basics”>


          <b:Town>Willingham</b:Town>


          <b:Street>Ryland</b:Street>


          <b:Number>123</b:Number>


        </HomeAddress>


      </AccountHolder>


    </getAccountHolderResponse>


  </soap:Body>


</soap:Envelope>


That’s a 12% reduction in message size in this simple example, but the potential is much larger for more complex messages with more nesting of types.


That’s all for now. Keep it lean and mean!


-Dino


[Update: as several commenters have pointed out, one can use the partial classes added in .NET 2.0 to eliminate the need to modify generated source code.  a very good point – see this post for details.]

Comments (10)

  1. Not sure you are using WSCF in projects – but would you think this is a nice addition to the tool?

  2. cheeso says:

    Ahh, Christian, of course, WSCF would be a nice addition.

    http://www.thinktecture.com/Resources/Software/WSContractFirst/default.html

    Not everyone has seen the contract-first light! My post was prompted by a question on an internal list, so I thought I’d share more widely.

  3. Nikolai says:

    I notice you say "Simply modify the generated type so it looks like this:"

    I am working on a project where we are working from XSD’s (some industry standards, some custom XSDs) -> ObjXSDGen -> Serialized classes used for web service parameters -> WSDL

    The generated classes are so large that manual modification is not really a good option, especially when XSDs are modified and classes regenerated

    Is there anyway this behaviour can be forced automatically by ObjXSDGen? Is there going to be an update to this tool?

    Are there alternative tools to ObjXSDGen?

    Is this approach fundamentally flawed?

    On another note, I am running into issues at the moment where a Java team interoperating with my .NET XSD based web services are send requests that are valid to the original XSD and WSDL, but the serializer is throwing up all sort of errors due to how they have constructed their XML. We are working through these together and making slight modifications to support the serializer, but it is frustrating for the other team I am working with.

  4. cheeso says:

    Nikolai, yes, have you seen the post on .NET 2.0?  

    http://blogs.msdn.com/dotnetinterop/archive/2006/01/06/510032.aspx

    You can use partial classes in .NET 2.0 to introduce this behavior.  This should be pretty easy to script in an automated build environment.  It needs to be done once for each class.  If you re-gen the classes, you need to add the "partial" qualifier to the  generated code.

    I am interested in hearing more details about the XML interop problems.  …can you provide any?

    -Dino

  5. Nikolai says:

    I had seen your post on partial classes, unfortunately most of the implementation has been complete on the 1.1 framework and a conversion at this stage is not possible. But I will look at this for future implementations.

    Our XML interop problems were simply to do with XML construction. I have been using Aaron Skonnard and Dan Sullivan XML Schema Validation (http://msdn.microsoft.com/msdnmag/issues/03/07/xmlschemavalidation/default.aspx) to ensure that incoming requests are valid against our schemas, but the manner in which the serialized classes expect the namesspaces to be constructed failed on the messages from our integration partners.

    As an example from one of the schemas we work from an XSD element such as:

    <xsd:element name="Demographics" type="eductf:DemographicsType"/>

    Would have a serialized class element generated as:

    [XmlType(TypeName="DemographicsType",Namespace=Declarations.SchemaVersion),Serializable] [EditorBrowsable(EditorBrowsableState.Advanced)]

    public class DemographicsType

    The corresponding serialized element from a .NET object would have element qualified namespaces like this:

    <Demographics xmlns="http://www.minedu.govt.nz/xmlschema/eductf/1.0">”>http://www.minedu.govt.nz/xmlschema/eductf/1.0">

    But the incoming message from the Java systems would be constructed like this with namespace declared on parent elements:

    <eductf:Demographics xsi:type="eductf:DemographicsType">

    This XML validated with no errors against the schema (although warnings are ignored) but we would recieve deserialization errors like:

    The specified type was not recognized: name=’DemographicsType’, namespace=’http://www.minedu.govt.nz/xmlschema/eductf/1.0‘, at &lt;Demographics xmlns=’http://www.minedu.govt.nz/xmlschema/eductf/1.0‘&gt;.

      at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReader1.Read42_Demographics(Boolean isNullable, Boolean checkType)

    When the XML was modified to <eductf:Demographics> to remove the XSI type the serializer worked correctly with no issues.

  6. cheeso says:

    Nikolai, on xsdobjectgen, a "2.0" release is planned, but no date yet. It will support .NET 2.0.  I don’t have a great deal of info on the features being added.

  7. cheeso says:

    Based on your description, there’s a simple solution,, generate the XML without the xsi:type attribute.  you don’t need it.

    BUT, what you report sounds wrong to me.  In my tests on .NET 1.1, deserializing an xml stream that contains xsi:type attributes doesn’t cause the de-serializer to choke.

    But I haven’t tested it extensively.  Can I see your xml and xsd?  Or better, a simple test case.  (dinoch / microsoft / com)

  8. cheeso says:

    Nikolai, I got an advisory that said the U of Auckland blocked an incoming email, it contained an .exe.   Maybe it was from you?  In any case I did not receive it. you can try sending a zip? or a uuencoded file, or gzip or something.

  9. Sam S says:

    Thank you for posting the information about the XmlNamespaceDeclarations attribute.  Your example was quite useful.  A slight change is that code generation now declares the classes as partial classes, therefore you don’t have to modify the generated code.  You just have to define a 2nd partial class and implement the constructor:

       [XmlRoot(Namespace = "http://mynamespace.com&quot;)]

       public partial class MyWebServiceClass

       {

           [XmlNamespaceDeclarations]

           public XmlSerializerNamespaces namespaces;

           public MyWebServiceClass()

           {

             namespaces= new XmlSerializerNamespaces();

             namespaces.Add("prefix", "http://mynamespace.com&quot;);

           }

       }