Understanding Known Types

Probably the aspect of WCF serialization developers have the hardest time with is known types. In fact, many developers don't even understand why DataContractSerializer needs known types. They get so used to the principles of object-oriented design that they forget that anything that can be done in object-oriented design can't necessarily be serialized and deserialized correctly.

A Necessary Evil

If you've read my last blog post, hopefully you'll understand the distinction between shared type and shared contract serializers. The former will write full type names and full assembly names on the wire which allows it to be very tightly coupled to the .NET platform. The latter, on the other hand, doesn't deal with CLR types, but deals with contract names and descriptions instead. Because of this, features that form the backbone of object-oriented languages (especially inheritance) don't translate well on the wire.

Here's an example that should make the problem more concrete. Suppose you define the following perfectly legal C# type:

[DataContract]

public class ObjectContainer

{

    public ObjectContainer(object o)

    {

        this.o = o;

    }

    [DataMember]

    public object o;

}

Now suppose we try serializing this out with DataContractSerializer:

XmlObjectSerializer serializer = new DataContractSerializer(typeof(ObjectContainer));

serializer.WriteObject(new XmlTextWriter(Console.Out) { Formatting = Formatting.Indented }, new ObjectContainer(new object()));

We get:

<ObjectContainer xmlns:i="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://schemas.datacontract.org/2004/07/ConsoleApplication10">

  <o />

</ObjectContainer>

 

Notice that we can serialize and deserialize this message fine. This is because the contract that was serialized is the same as the contract that was deserialized. The deserializer expected an object and the serializer sent an object.

 

Now consider serializing a string instead of an object as the object that is contained. You might expect to see this XML:

 

<ObjectContainer xmlns:i="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://schemas.datacontract.org/2004/07/ConsoleApplication10">

  <o>MyString</o>

</ObjectContainer>

 

But instead, you get the following:

 

XmlObjectSerializer serializer = new DataContractSerializer(typeof(ObjectContainer));

serializer.WriteObject(new XmlTextWriter(Console.Out) { Formatting = Formatting.Indented }, new ObjectContainer("myString"));

<ObjectContainer xmlns:i="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://schemas.datacontract.org/2004/07/ConsoleApplication10">

  <o xmlns:d2p1="https://www.w3.org/2001/XMLSchema" i:type="d2p1:string">MyString</o>

</ObjectContainer>

 

What is going on? Essentially, the serializer realized that there was a mismatch between the contract the deserializer expects and the instance of ObjectContainer it was serializing, so it decided to add some information about the data contract name and namespace of string. This way, when this instance of ObjectContainer is being deserialized, the deserializer can look at the xsi:type information and realize that it needs to deserialize MyString as a string instead of as an object.

 

Again though, if you try serializing and deserializing an ObjectContainer containing string, it just works without the user having to do anything about it. This is because string is a primitive as far as DataContractSerializer is concerned, and all primitives are automatically known types. So in this case, the serializer takes care of the known type logic without the user having to do anything at all. However, if you were to try:

 

XmlObjectSerializer serializer = new DataContractSerializer(typeof(ObjectContainer));

serializer.WriteObject(new XmlTextWriter(Console.Out) { Formatting = Formatting.Indented }, new ObjectContainer(new MyType()));

you would get the following exception:

Unhandled Exception: System.Runtime.Serialization.SerializationException: Type 'ConsoleApplication10.MyType' with data contract name 'MyType:https://schemas.datacontract.org/2004/07/ConsoleApplication10' is not expected. Add any types not known statically to the list of known types - for example, by using the KnownTypeAttribute attribute or by adding them to the list of known types passed to DataContractSerializer.

The serializer needs to know about all the types that ObjectContainer can contain so that it can write out the correct xsi:type information on the wire. By doing so, it allows the deserializer to know what type that element should be deserialized into. If that information weren’t there, the deserializer would have no idea how to deserialize the XML. For example, how would it be able to deserialize the following XML:

 

<ObjectContainer xmlns:i="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://schemas.datacontract.org/2004/07/ConsoleApplication10">

  <o>1986</o>

</ObjectContainer>

 

It would be impossible to know whether 1986 were a number or simply the string “1986” since o is any object.

 

How to Identify and Fix Known Type Issues

So when should you be keeping a lookout for known type issues in your serialization code or WCF service. Quite simply, in any case where you are sending an instance of a type on the wire that doesn’t exactly match the type that is expected. This includes, but isn’t limited to, any time you’re using inheritance.

 

Fixing known type issues is usually quite simple. There are many ways of specifying known types. They can be specified in the constructor of the serializer:

 

XmlObjectSerializer serializer = new DataContractSerializer(typeof(ObjectContainer), new Type[] { typeof(MyType) });

They can be specified on the type where the known type issue happens. When applied in this way, the KnownType applies to the entire subtree of that type:

 

[DataContract]

[KnownType(typeof(MyType))]

public class ObjectContainer

{

    public ObjectContainer(object o)

    {

        this.o = o;

    }

    [DataMember]

    public object o;

}

They can be specified on the type with the KnownType attribute specifying a method, which is required if you need generic known types that depend on the generic parameter:

 

[DataContract]

[KnownType("GetKnownTypes")]

public class ObjectContainer<T>

{

    public ObjectContainer(object o)

    {

        this.o = o;

    }

    [DataMember]

    public object o;

    static Type[] GetKnownTypes()

    {

        return new Type[] { typeof(MyType<T>) };

    }

}

Known types can also be applied in configuration files, or on WCF Operations and Services:

[ServiceContract]

[ServiceKnownType(typeof(MyType))]

public interface IMyService

{

    ObjectContainer GetObjectContainer();

}

The downsides of using Known Types

 

After all this talk about known types, you may be surprised to learn that you should try to avoid using known types. Why? There are two reasons for this:

 

1. Using known types is usually an indication of bad SOA design. Think about your service and the data you’re sending back and forth. The data should be fully encapsulated in a message that always takes on the same form. If you find that you need to have an operation that should be able to take multiple types as one of its parameters, then maybe you should consider splitting it into multiple operations. Or maybe you should consider defining a data transfer object that contains exactly the information you need to send over the wire. Your interoperability may also suffer if you decide to stick to known types because the other end (which may not use DataContractSerializer) would have to be able to understand xsi:type information.

 

2. Using known types decreases performance. Known types decreases performance on both ends of the wire. On serialization, the serializer needs to reflect over the type, find the known type it represents, and write out the name and namespace of the known type data contract. Then, on deserialization, the deserializer needs to read that information, find the type that has a datacontract that matches the name on the wire, and then deserialize the instance as the correct type. And all of this work has to be done for every instance that is serialized, so it will affect significantly the throughput of your serialization.

Conclusion

 

Known types are one of the harder parts about using DataContractSerializer because serialization with data contracts is more limited than object-oriented design is. In return though, serializing using shared contracts allows for interoperability (as well as other benefits detailed in the last post). If possible, you should always try to avoid using known types. But if it seems like you need to use them, hopefully this post will have given you the tools to be less confused next time you see a known types exception.