When deserialize(serialize(x)) != x

So here’s another random little piece of information that might help someone else out (or me 6 months from now)… As I’ve been rumbling about, lately I’ve been playing quite a bit with webservices and serialization—especially DataContract serialization—and in one of my recent projects I was really fighting with what appeared to be a very simple unit test. The test was designed to verify that a particular structure (guess what structure…a graph of entities) would serialize properly across a webservice. As a simple first approximation, I thought I’d write a little routine which would take an object serialize it with the DataContract serializer and then immediately deserialize it so that I could compare the results to the original. This is quicker and easier than building out the full web service, so it seemed like a good idea. In fact, the idea got even better when I found some code laying around which someone else had written just for this purpose. The code looked something like this:

public static string Serialize<T>(T o)

{

    return Serialize<T>(o, new Type[0]);

}

public static string Serialize<T>(T o, IEnumerable<Type> knownTypes)

{

    DataContractSerializer dcs = new DataContractSerializer(typeof(T), knownTypes);

    StringBuilder sb = new StringBuilder();

    XmlWriter writer = XmlWriter.Create(sb);

    dcs.WriteObject(writer, o);

    writer.Close();

    string xml = sb.ToString();

    return xml;

}

public static T Deserialize<T>(string xml)

{

    return Deserialize<T>(xml, new Type[0]);

}

public static T SerializeAndDeserialize<T>(T o)

{

    return Deserialize<T>(Serialize<T>(o));

}

Beautiful… Now all I have to do is call the generic method SerializeAndDeserialize and I will get back a new object (or ideally a graph of objects) which should be an accurate copy. At least as accurate as the serialization will be, right? WRONG.

Everything was working fine until I happened to run this little baby on some data that had strings with embedded carriage returns (that’s \r for us c# dudes—in my case the data was RTF). Suddenly my comparisons were failing, and it took me quite some time to realize that it had nothing to do with the serializer or the code I was testing. The problem was in the above code. Apparently if you serialize using an XmlWriter over a StringBuilder, these carriage returns are lost. If you actually build a web service with WCF, though, everything goes through fine. ARRRGGGG.

The fix? Use a memory stream instead of XmlWriter/StringBuilder. So the updated code looks like this:

public static Stream Serialize<T>(T o, IEnumerable<Type> knownTypes)

{

    DataContractSerializer dcs = new DataContractSerializer(typeof(T), knownTypes);

    MemoryStream stream = new MemoryStream();

    dcs.WriteObject(stream, o);

    return stream;

}

public static T Deserialize<T>(Stream stream)

{

    return Deserialize<T>(stream, new Type[0]);

}

Other methods look the same. Wouldn’t you know it, the new and improved Serialize method is even shorter and simpler than the old one.

Here’s hoping the next guy finds this post instead of banging their head against the wall for 3 or 4 hours like me. ;-)

- Danny