WCF Extensibility – Serialization Callbacks

This post is part of a series about WCF extensibility points. For a list of all previous posts and planned future ones, go to the index page.

Continuing on the topic about serialization, the serialization callbacks are methods which, when tagged appropriately, are called by the serializer during the serialization or deserialization of an object of the type where the methods are defined. Those callbacks predate WCF, being part of the .NET Framework since its 2.0 version, supporting the same scenario in other serialization stacks, such as the BinaryFormatter and the SoapFormatter (interestingly, the XmlSerializer, the first general-purpose serializer from the framework, doesn’t implement those, possibly because it already gives a lot of control to the user with types such as those which implement the IXmlSerializable or ISerializable interfaces). The idea behind those callbacks is that they’re invoked in many places during the serialization / deserialization of an object to allow some user initialization / finalization code to take place.

Below is some code which shows the the “canonical” example of those callbacks. For types which are to be serialized, if you decorate an instance method with any of the serialization callback attributes (OnDeserializedAttribute, OnDeserializingAttribute, OnSerializedAttribute, OnSerializingAttribute). The method must not have a return type (“void” in C#, Sub in VB), and it must have a single parameter of type StreamingContext. When an object of the type MyObject is about to be serialized, its OnSerializing method is called (by the way, the method names do not have to match the attribute names, it’s just a convention that I like to use). After it’s been written out, its OnSerialized method is called. When an instance of that type is about to be deserialized from a stream (or a reader), its OnDeserializing is called, and finally when all its members from the stream have been deserialized, OnDeserialized is called.

  1. [DataContract]
  2. public class MyObject
  3. {
  4.     [DataMember]
  5.     public string Member { get; set; }
  6.  
  7.     public override string ToString()
  8.     {
  9.         return string.Format("MyObject[Member={0}]", this.Member);
  10.     }
  11.  
  12.     [OnSerializing]
  13.     void OnSerializing(StreamingContext ctx)
  14.     {
  15.         Console.WriteLine("This is called before serialization");
  16.     }
  17.  
  18.     [OnSerialized]
  19.     void OnSerialized(StreamingContext ctx)
  20.     {
  21.         Console.WriteLine("This is called after serialization");
  22.     }
  23.  
  24.     [OnDeserializing]
  25.     void OnDeserializing(StreamingContext ctx)
  26.     {
  27.         Console.WriteLine("This is called before deserialization");
  28.     }
  29.  
  30.     [OnDeserialized]
  31.     void OnDeserialized(StreamingContext ctx)
  32.     {
  33.         Console.WriteLine("This is called after deserialization");
  34.     }
  35. }
  36. class SerializationPost
  37. {
  38.     public static void Test()
  39.     {
  40.         MemoryStream ms = new MemoryStream();
  41.         DataContractSerializer dcs = new DataContractSerializer(typeof(MyObject));
  42.         dcs.WriteObject(ms, new MyObject { Member = "hello" });
  43.         Console.WriteLine("Serialization done: {0}", Encoding.UTF8.GetString(ms.ToArray()));
  44.         ms.Position = 0;
  45.         object o = dcs.ReadObject(ms);
  46.         Console.WriteLine("Deserialization done: {0}", o);
  47.         ms.Position = 0;
  48.     }
  49. }

One small aside about the last requirement (single parameter of type StreamingContext) – this is more of a legacy compatibility issue (that’s the signature required for those callbacks for use with the BinaryFormatter and SoapFormatter). This parameter is never used by the main WCF serializers – in the DataContractSerializer, XmlSerializer and DataContractJsonSerializer, the parameter is always an object with the Context property set to null (Nothing in VB) and the State property set to StreamingContextStates.All). For the NetDataContractSerializer, you can specify an object while creating the serializer, and it will be passed to the callbacks when they’re called. But really, I’ve never seen it being used in the context of WCF.

Ok, this is interesting, we can know when an object of the type is being serialized / deserialized. But why do we really need those callbacks, why do we care when the objects are being serialized or deserialized? Shouldn’t we simply decorate the types with the appropriate attributes to get full advantage of the declarative serialization and not have to do anything else? In most cases, this is true. Those attributes aren’t used in most instances, but there are some scenarios in which they come out handy. Here are some of the scenarios, along with how the serialization attributes can be used to solve the issues.

Object Initialization

This is a common issue from the forums: the type has some initialization (field initializer or constructor), but when the type is deserialized, those initializers are not applied. When an instance of the type below, for example, is read, the Name field is properly initialized from the data from the serialized stream, but the fields ID and Friends are set to their default values (0 and null / Nothing, respectively).

  1. [DataContract]
  2. public class Person
  3. {
  4.     // This is serialized
  5.     [DataMember]
  6.     public string Name;
  7.  
  8.     // Those are not serialized
  9.     public int ID = 123;
  10.     public List<string> Friends = new List<string>();
  11. }

The issue is that when the serializer is creating an object during deserialization, the type constructor is not invoked (in most cases, I’ll go into the exceptions later). Instead, the serializers use the FormatterServices.GetUninitializedObject method to create an empty instance of the object, with all of its members set to the default value for their type. This is really counter-intuitive for most people, since the constructor is the first “line-of-defense” for the type to maintain internal consistency (i.e., the constructor will often have some validation, null / range checks, etc., to prevent its internal state from becoming invalid). So why do the serializers skip that?

Well, the WCF serializers could have called the constructor. That’s exactly what the XmlSerializer does – it only works with classes which contain a public, parameter-less constructor. Other serializers which predated WCF (the [Binary/Soap]Formatter classes) didn’t – they created an uninitialized object and set their properties from the deserialization stream. The advantage of the latter model is that it supports the serialization of most types, even those which don’t have a parameter-less constructor, which is why I think that model was chosen.

A common follow up question: but why not call any constructor in the type, even if it meant passing some parameters to the constructor? That opens a new set of problems – how can we guarantee that the parameters are correct (that won’t cause the constructor to throw an exception)? What if there are multiple constructors, which one to choose? Any heuristic chosen by the WCF team would be wrong in some scenarios, so this option doesn’t work. Yet another follow up question: if the type has a default constructor, why doesn’t WCF invoke it? The answer for this one in consistency – once you go in the constructor-less route, you should go all in, or not go at all, so I think this decision makes sense.

So I hope I have convinced you why the serializers skip the constructor (or at least given you a good reason why they don’t), the problem still exists. What to do if we want to initialize the object? That’s the main purpose of the OnDeserializingAttribute. Right after the uninitialized object is created, the serializer will call any method in the type which is decorated with this attribute. There we can initialize any members which are needed, before the fields from the stream are read into the object. The type Person can be rewritten to be deserialization-aware as shown below.

  1. [DataContract]
  2. public class Person
  3. {
  4.     // This is serialized
  5.     [DataMember]
  6.     public string Name;
  7.  
  8.     // Those are not serialized
  9.     public int ID;
  10.     public List<string> Friends;
  11.  
  12.     public Person()
  13.     {
  14.         this.Initialize();
  15.     }
  16.  
  17.     [OnDeserializing]
  18.     public void OnDeserializing(StreamingContext context)
  19.     {
  20.         this.Initialize();
  21.     }
  22.  
  23.     private void Initialize()
  24.     {
  25.         this.ID = 123;
  26.         this.Friends = new List<string>();
  27.     }
  28. }

So whenever you decorate an object with a serializable attribute (DataContractAttribute, SerializableAttribute), you need to think about whether the initialization will need to be handled differently for deserialization scenarios. The types which are not decorated with anything (a.k.a., Plain Old Clr Object, or POCO types) behave differently – those are types which the creator didn’t think about serialization explicitly by decorating them appropriately. In this case, the parameter-less constructor is always invoked (if the type doesn’t have one, it isn’t serializable at all) when it’s being deserialized. Other kinds of objects also have their constructors called: types which implement the IXmlSerializable interface (it needs to have a parameter-less constructor), and types which implement the ISerializable interface (during deserialization constructor with a SerializationInfo and a StreamingContext parameters is called).

Object state validation

Besides initialization, another function of the constructor is to guarantee that the parameters passed to it are valid, and throw an exception if it’s not the case. In the example below, all three parameters are validated, and we also have a cross-parameter validation between two parameters. The problem with this approach is, the constructor is not invoked during serialization, so that logic needs to be moved somewhere else.

  1. [DataContract]
  2. public class Person
  3. {
  4.     [DataMember]
  5.     public string Name { get; private set; }
  6.     [DataMember]
  7.     public int Age { get; private set; }
  8.     [DataMember]
  9.     public double Salary { get; private set; }
  10.  
  11.     public Person(string name, int age, double salary)
  12.     {
  13.         if (name == null) throw new ArgumentNullException("name");
  14.         if (age < 0) throw new ArgumentOutOfRangeException("age");
  15.         if (salary < 0) throw new ArgumentOutOfRangeException("salary");
  16.  
  17.         if (age < 12 && salary > 0)
  18.         {
  19.             throw new InvalidOperationException("No child labor allowed");
  20.         }
  21.  
  22.         this.Name = name;
  23.         this.Age = age;
  24.         this.Salary = salary;
  25.     }
  26. }

One possible solution for this problem is to move the validation to the properties themselves (which are called by the serializer to set the members), and this would solve the validation problem for the parameter themselves. That works for types decorated with [DataContract] – which serializes any members decorated with [DataMember], but not for types decorated with [Serializable] – which only serializes the fields of the type (not properties). Normally we’d simply say to use [DataContract], but it’s possible that some legacy system still uses that type, so it still needs the SerializableAttribute.

  1. private string name;
  2.  
  3. [DataMember]
  4. public string Name
  5. {
  6.     get
  7.     {
  8.         return this.name;
  9.     }
  10.     private set
  11.     {
  12.         if (value == null) throw new ArgumentNullException("value");
  13.         this.name = value;
  14.     }
  15. }

And besides the issue of dealing with [Serializable] types, we still have the problem of how to validate all the parameters of the object (the “child labor” rule), or to run some custom code after all the members of the object have been deserialized. That’s where OnDeserializedAttribute enters. When all the members of the object have been deserialized, the method decorated with this attribute will be invoked. There we can centralize the validation for serialization scenarios.

  1. [OnDeserialized]
  2. public void OnDeserialized(StreamingContext context)
  3. {
  4.     if (this.Name == null) throw new ArgumentNullException("name");
  5.     if (this.Age < 0) throw new ArgumentOutOfRangeException("age");
  6.     if (this.Salary < 0) throw new ArgumentOutOfRangeException("salary");
  7.  
  8.     if (this.Age < 12 && this.Salary > 0)
  9.     {
  10.         throw new InvalidOperationException("No child labor allowed");
  11.     }
  12. }

Fine-grained control of serialization format for primitives

The “primitive” types supported by the WCF serializers have a pre-defined format which they’re serialized, and unlike complex types, this format cannot be overridden using surrogates (more on surrogates on the next post). DateTime objects, for example, are always serialized in a fixed format (e.g. 2011-09-05T10:38:39.5636107-07:00, the time I’m writing this paragraph). If you want to change it to a different format (i.e., date only, time only, etc.), you can’t really use the DateTime type.

One possible solution is to use a property to do the conversion, and store the field to be used in the data contract as a string, as in the example below.

  1. [DataContract]
  2. public class Person
  3. {
  4.     [DataMember(Name = "Birthday")]
  5.     private string birthday;
  6.  
  7.     public DateTime Birthday
  8.     {
  9.         get
  10.         {
  11.             return DateTime.ParseExact(this.birthday, "yyyy-MM-dd", CultureInfo.InvariantCulture);
  12.         }
  13.         set
  14.         {
  15.             this.birthday = value.ToString("yyyy-MM-dd", CultureInfo.InvariantCulture);
  16.         }
  17.     }
  18. }

But in this case, every time the property is accessed, a conversion is done, so this solution isn’t the best from a performance perspective. One alternative is to only do the conversion during serialization episodes. Prior to the serialization of the type (OnSerializing), the code would set the serializable member, and after the deserialization of the members are complete (OnDeserialized), we store the value from the stream into the public field.

  1. [DataContract]
  2. public class Person
  3. {
  4.     [DataMember(Name = "Birthday")]
  5.     private string birthdayForSerialization;
  6.  
  7.     public DateTime Birthday { get; set; }
  8.  
  9.     [OnSerializing]
  10.     void OnSerializing(StreamingContext context)
  11.     {
  12.         this.birthdayForSerialization = this.Birthday.ToString("yyyy-MM-dd",
  13.             CultureInfo.InvariantCulture);
  14.     }
  15.  
  16.     [OnDeserializing]
  17.     void OnDeserializing(StreamingContext context)
  18.     {
  19.         this.birthdayForSerialization = "1900-01-01";
  20.     }
  21.  
  22.     [OnDeserialized]
  23.     void OnDeserialized(StreamingContext context)
  24.     {
  25.         this.Birthday = DateTime.ParseExact(this.birthdayForSerialization, "yyyy-MM-dd",
  26.             CultureInfo.InvariantCulture);
  27.     }
  28. }

Notice that the example above also includes the OnDeserializing callback in the class – if the stream didn’t contain the “Birthday” member, then it would retain its default value (null), which would cause a NullReferenceException when it would be converted to the DateTime member. By populating it prior to the deserialization, we ensure that we have something which can be converted.

Other scenarios

There are many other scenarios for which the callbacks can be useful, as some listed below.

  • Object locking: the serialization of an object happens in a single thread, but if other threads are being executed and can modify the object causing it to be in an invalid state, we may want to protect it from modifications while it’s being serialized. On the OnSerializing callback we’d acquire a mutex (which is used in other methods / properties which modify the object), and it would only be released on the OnSerialized callback.
  • Backup: The “archive flag”, which was common in the DOS world, would be set in a file whenever the file was changed (so it needed to be backed up), and reset whenever a backup was done for that file (or vice-versa, I can never remember it exactly). We can have something similar in serialization: if a property in the object is changed, the “archive flag” is set, and when the object is persisted (via serialization), an OnSerialized callback would reset that flag
  • Timestamp: to differentiate between two serialized versions of the same object, an OnSerializing callback can be used to save the current time in a serializable member, and this would be saved along with the object

There are many others, I’ll leave it for your imagination (or any real problems) to come up with those.

Final thoughts about serialization callbacks

A few things about serialization callbacks which I think are important to mention:

  • Collection types: the serialization callbacks do not work on collection types. This is a common request which may be implemented in a future release of the .NET Framework, but as of 4.0, those types do not honor the serialization callbacks.
  • Inheritance: if both base and derived types have a method decorated with the serialization callbacks, first the one from the base type is called, then the one from the derived type is called. This is the same for both serialization and deserialization callbacks.
  • XmlSerializer: the XML Serializer does not honor the serialization callbacks. The fact that the default constructor is called in this serializer solves many of the issues which are handled by the callbacks.

Coming up

Continuing with the serialization extensibility points, I’ll cover the surrogate support in the WCF serializers.

[Back to the index]