Optimizing Away Repeat XML Namespace Declarations with DataContractSerializer

For performance reasons, DataContractSerializer can’t always figure out what namespaces will be used in a serialized instance ahead of time. And so, you may sometimes end up having a certain XML namespace defined over and over again when it only has to be declared once. This can be particularly painful because XML namespaces tend to be very long. In the worst cases, namespace declarations can end up representing a majority of the serialized instance and significantly hinder your performance. So here’s a way to make sure that doesn’t happen to you, although it does take a little bit of tinkering.

 

Suppose you have the following DataContracts:

 

[DataContract(Namespace="https://www.some-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/")]

public class A

{

    [DataMember]

    public B B;

    public A() {

        B = new B();

    }

}

[DataContract(Namespace = "https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/")]

public class B

{

    [DataMember]

    public string s = "foo";

}

 

If you try serializing an array of five A’s like this:

 

var o = new A[5];

for (int i = 0; i < 5; i++)

{

    o[i] = new A();

}

var ser = new DataContractSerializer(o.GetType());

var writer = new XmlTextWriter(Console.Out) { Formatting = Formatting.Indented };

ser.WriteObject(writer, o);

 

you’ll get the following XML:

 

<ArrayOfProgram.A xmlns:i="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://www.some-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

  <Program.A>

    <B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

      <d3p1:s>foo</d3p1:s>

    </B>

  </Program.A>

  <Program.A>

    <B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

      <d3p1:s>foo</d3p1:s>

    </B>

  </Program.A>

  <Program.A>

    <B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

      <d3p1:s>foo</d3p1:s>

    </B>

  </Program.A>

  <Program.A>

    <B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

      <d3p1:s>foo</d3p1:s>

    </B>

  </Program.A>

  <Program.A>

    <B xmlns:d3p1="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

      <d3p1:s>foo</d3p1:s>

    </B>

  </Program.A>

</ArrayOfProgram.A>

Notice how the highlighted namespace was defined five times, creating a lot of bloat when it could just be defined once at the top-level. To fix this issue, you can use the following code:

 

ser.WriteStartObject(writer, o);

writer.WriteAttributeString("xmlns", "p", null, "https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/");

ser.WriteObjectContent(writer, o);

ser.WriteEndObject(writer);

instead of

ser.WriteObject(writer, o);

We first write the start of the object, then register the long namespace with the prefix “p” at the top level, write the object itself, and finally the end of the object. This results in a much more compact XML on the wire that’s equivalent to the XML we generated earlier:

 

<ArrayOfProgram.A xmlns:p="https://www.some-other-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/" xmlns:i="https://www.w3.org/2001/XMLSchema-

instance" xmlns="https://www.some-reaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaally-long-namespace.com/">

  <Program.A>

    <B>

      <p:s>foo</p:s>

    </B>

  </Program.A>

  <Program.A>

    <B>

      <p:s>foo</p:s>

    </B>

  </Program.A>

  <Program.A>

    <B>

      <p:s>foo</p:s>

    </B>

  </Program.A>

  <Program.A>

    <B>

      <p:s>foo</p:s>

    </B>

  </Program.A>

  <Program.A>

    <B>

      <p:s>foo</p:s>

    </B>

  </Program.A>

</ArrayOfProgram.A>

All of the repeat namespace declarations are gone, in favor of just one namespace definition at the top. Of course, it’s also possible to integrate this type of serialization into WCF. Just create a serializer that inherits from XmlObjectSerializer that uses a DataContractSerializer for all of its methods, except for the fact that it registers additional namespaces at the top level. Then create a behavior that derives from DataContractSerializerOperationBehavior with a CreateSerializer method that returns the XmlObjectSerializer you just created and plug in the behavior.

 

Oh, and if you get a chance, take a look at the trailer for the Office 2010 movie everyone's so excited about.