Question: Deep serialization of an object graph--how deep should it go?

Article
11/20/2007

So, I've been thinking lately about serializing/remoting object graphs. The entity framework currently serializes an entire object graph when binary serialization is used but only serializes one entity at a time in XML/DataContract scenarios. I'm working on a sample designed to show how graphs can be serialized, and we're looking into ways to make this even simpler/more automatic. In the process, though, an issue has come up that has me concerned: What if the graph in memory can vary in size? Might you want to serialize only a subset of the graph, and if so how should that subset be specified? Would it always be the same or might you want to specify different subsets for different operations?

To give some context, let's take an example: Assume we have a model with Customers, Orders, OrderLines and Products. Now let's assume that the reason we're serializing things is that we have a web mehtod which returns a customer. With the EF today and a method that just directly returns a customer you would only get the customer, but let's assume for the moment that you could easily indicate that a whole graph should be returned rather than just a single entity. If that were the case, then there are a few possibilities:

1) The entire graph connected to the customer is returned every time. If you are building a stateless webservice, then you would likely construct a new ObjectContext instance each time the method is called, retrieve from the DB just those entities you want to return and then return them. In this scenario, returning the entire graph every time works just fine because the entire graph contains exactly what you want to return.

What if the context is maintained across multiple operations, though? Then everything would still be fine as long as other data retrieved into that context is disjoint from the graph containing the customer you want to return. You could, for instance, retrieve customer1, all of that customer's orders and all of those orders' orderlines as well as customer2 and all of their orders and orderlines and returning customer1 would be unaffected by the fact that customer2 had been loaded into the context. The moment you retrieve into that context a product which has been ordered by both customer1 and customer2, though, those two subgraphs become part of a larger graph, and returning either customer would actually cause the full graph including both customers and all their orders to be returned.

2) One way to address the potential issue with option 1 would be to remove certain navigation properties or annotate some of them to indicate that serialization should stop at that point. So, in the specific example above, the relationship from product back to the orderlines containing that product could be marked so that serializaiton wouldn't travel over it. This would allow a customer graph including products to be returned without that ever leading to the graph for multiple customers being returned all at once.

The problem with this approach, though, is that you might want some web methods to serialize different subgaphs than others. What if you wanted to add to the method that returns a customer a different method which returns a product and all of the orders that contain an instance of that product? In that case, the annotation indicating that products should not serialize the order lines that reference them (necessary to make the customer returning method work correctly) would prevent the products returning method from working as intended.

3) So, another approach altogether would be to have some mechanism to indicate on a method-by-method basis what subgraph to return. Naturally, this kind of mechanism provides the most flexibility, but it's also the most complicated to build and to explain, and it generates other questions like: Is it OK to always serialize all members of a collection as long as that collection is included, or are there scenarios where you would want to perform a filtered serialization where only part of a collection is serialized even though the whole thing is present in memory?

So, what do you think? I could really use the feedback. Binary serialization already uses option #1 above, and part of me thinks that option #1 may well be good enough for almost all scenarios. There's no doubt that it would be a LOT simpler to build and to explain, but if there are important, common scenarios where it isn't good enough, then maybe we need to take on #3.

Thanks,
Danny

Question: Deep serialization of an object graph--how deep should it go?

Additional resources