This post is part of a series about WCF extensibility points. For a list of all previous posts and planned future ones, go to the index page.
The next few posts will talk about the extensibility points existing in the serialization process for WCF, and some scenarios where they’d be useful. But first I decided to give a quick intro on the subject (with many links to other posts which have been around for a while), and on the XML Infrastructure on WCF, so I can refer back on the next posts.
Serialization is the process by which a CLR object is converted into a format which can be sent over the wire (or persisted in disk, or written to the console, etc.), and vice-versa. In WCF, the serialization format is always XML. More specifically, the WCF serializers convert between an object in memory and a XML Infoset. The Infoset is an abstract representation of a XML document, which is an intermediate step between the object itself and its wire representation. The way to convert between the XML Infoset and the actual bytes (something concrete) is to use a XmlWriter (Infoset to bytes) or XmlReader (bytes to Infoset). Those are abstract classes which allow for different XML formats, which is what we have in WCF.
XML Readers / Writers
Out of the box, WCF has 4 concrete implementations of the XmlWriter / XmlReader classes, each one with a specific XML format – they’re actually derived from XmlDictionaryWriter / XmlDictionaryReader, which are themselves derived from the XmlWriter / XmlReader classes and add some additional functionality. All of the readers / writers are implemented as internal classes, which can be accessed by the members of their public parent classes:
- Text: created by XmlDictionaryWriter.CreateTextWriter / XmlDictionaryReader.CreateTextReader, the format of those are what people think of XML: lots of angle brackets surrounding elements. These reader / writer classes have a better performance than the “normal” XML reader writers created by the XmlWriter.Create and XmlReader.Create methods, but to the expense that they don’t support the full XML dialect (for example, processing instructions, entity references and DTDs are disallowed).
- Binary: created by XmlDictionaryWriter.CreateBinaryWriter / XmlDictionaryReader.CreateBinaryReader, the format of those are a compact, binary version of XML. It’s defined at the Windows Communication Protocol specification [MC-NBFX]: .NET Binary Format: XML Data Structure, and it achieves it compactness by several optimizations, including getting away with end elements, better representation for numbers (in binary format), no need to expand binary data (since not all characters are allowed in “text” XML, binary data is usually base64-encoded, which increases its size about 33%) and using some dictionaries to optimize some well-known strings. Nicholas Allen has a good introductory series on the binary encoder (part 1, part 2, part 3, part 4, part 5, part 6, part 7), and I’ve posted about the binary dictionaries in the past as well.
- Mtom: created by XmlDictionaryWriter.CreateMtomWriter / XmlDictionaryReader.CreateMtomReader, the format is an intermediate between the interoperability of the Text and the compactness of the Binary format. Since MTOM is a W3C Recommendation, stacks other than WCF from other vendors (including IBM and Java) implement this format, and it basically takes binary data and separates it from the “main” XML by using a MIME envelope with the “main” Infoset and “attachment” parts which are included by the main part.
- Json (new in 3.5): created by JsonReaderWriterFactory.CreateJsonWriter / JsonReaderWriterFactory.CreateJsonReader. Yes, even when WCF is serializing an object to JSON, it still serializes it first to a XML Infoset, then it uses the JSON writer to write it out as what doesn’t have any resemblance to XML. The mapping between JSON and XML is one of the controversial decisions made in WCF (why does everything need to be XML) which made sense at the time (XML was supposed to cure all evils), but has caused some problems (like the one I mentioned on the scenario for the post on Message Inspectors). Since JSON isn’t XML, the JSON writer can only be used to write a certain kind of Infoset, which is the one produced by the DataContractJsonSerializer (and not the one exposed by the original DataContractSerializer).
Notice that you don’t have to use those writers / readers for the serialization. You can as well use the ones created by the base classes (XmlWriter.Create and XmlReader.Create, for example if you want to change some settings such as indenting) or even create your implementation like the scenario of this post.
Now for the serialization itself. WCF serializers can be used stand-alone, outside of any service code (for example, if an application wants to persist some state locally, it can use a serializer for that). The example below shows an object being saved to a file.
In WCF you usually have a choice between two serializers. The DataContractSerializer (DCS) used in the example above is the default serializer for WCF. DCS doesn’t support all of the XML (including attributes, out-of-order elements and other constructs), but it’s quite fast because it has a simple mapping between standard programming constructs (records, arrays) and XML (sequences). MSDN articles explain thoroughly how the DataContractSerializer works, and which types are supported by it (Sowmy’s post on the WCF serialization programming model is a good reference one as well).
The other main serializer in WCF is the XmlSerializer (which actually predates WCF, existing since the first version of the .NET Framework), the general-purpose XML serializer (and because of that it’s been made the default serializer in the WCF Web API project from Codeplex), but it is usually slower than the DCS. The XmlSerializer supports all the serialization attributes in the System.Xml.Serialization namespace, which gives you great control over the format of the XML which is produced (or consumed). The MSDN article on the XmlSerializer in WCF has more information about this serializer.
Other serializers include the NetDataContractSerializer, which, unlike the other serializers which require some additional information to be given on polymorphic scenarios, doesn’t have this problem, because it includes .NET Framework type information when serializing the objects (but then its usage is limited to scenarios where the assembly where the type is defined is shared between the two communicating parties). And finally, the DataContractJsonSerializer already mentioned above is also a built-in serializer in WCF (since .NET Framework 3.5), which can convert between objects and XML Infosets which comply with the rules outlines in the mapping between JSON and XML.
Declarative vs. Imperative serialization
Most of the types which are handled by the serializers define how they are serialized by using attributes. By decorating a type with [DataContract] (and its members with [DataMember]) we’re telling the DataContractSerializer what needs to be serialized there. By marking a type with [Serializable] we’re saying that, well, it can be serialized according to the rules for that attribute (all fields – not properties – of the type are serialized). By adding [XmlType] / [XmlElement] / [XmlAttribute], etc. to a type, we’re instructing the XmlSerializer how to proceed with it. Even by not adding anything we’re “telling” the serializers to use the POCO (plain-old CLR object) rules. The serializers can then reflect on the type and find out the schema expected of a serialized instance of that type. This is important because when those types are used in a service, the schema is published along with the service metadata, and tools such as svcutil.exe or the Add Service Reference wizard in Visual Studio can create a proxy with types which can be correctly serialized and deserialized to communicate with the service.
Other types, however, use imperative serialization, which means that they’re responsible for writing and reading the object. There are in WCF two kinds of imperative serialization: types which implement the IXmlSerializable interface and types which implement the ISerializable interface. By my definition of “extensibility point”, they can be considered as such, since we can define classes which implement one of those interfaces, and when they’re being serialized (or deserialized), our code is executed (and we can control exactly what is serialized / deserialized). But their MSDN pages are fairly decent, so they don’t deserve a special post for them.
One problem with imperative serialization is that the serializers have no way of knowing the schema of the type (since we can write anything we want, even change it depending on the object, the time of the day or the weather forecast), so they really can’t infer the schema of those types. For IXmlSerializable types this can be solved in WCF by adding a [XmlSchemaProvider] attribute to the type, and in such method we can provide the schema for the type. ISerializable types don’t have that, so be cautious if you intend to use them in a service which a client not provided by you (with the implementation of that ISerializable type which knows how to recreate itself).
Each serializer has its own places where code can be executed. The next posts will cover them, starting with the serialization callbacks.