Persisting complex objects…



Q: I am on a large C# project with a fortune 500 company, and one of our first design challenges is how to persist large, complex objects. We have 1000’s of users, so bw is an issue. When a user requests this object to edit, they may only edit one field within on of the many aggregate objects. Do you of any efficient design patterns for storing only what changed, instead of resending/restoring the entire object?


****


This is an interesting question. I’ll give you my opinion, and then readers can chime in if they have any other suggestions (you should probably give their responses more credence than mine…)


I like using serialization when I need to send live objects over a wire, or when the objects are tiny, or when I don’t care about performance. But I’m an old database guy (in both senses of the word “old”), and this is the sort of situation that is tailor-made for using a database, with a separate column for each field. That gives you the ability to update a single value quickly and easily. To get a minimal update, you will need some sort of change tracking…


You can get that through the .NET dataset class. I’m not a big fan of dataset, as I like to write much more close to the metal, but it is convenient to use and it has built-in change tracking. Another option is to write it yourself for each object – setting a property also sets a modifed bit, and when you update you put all those bits together into a SQL query. It’s tedious code to write but not very tough. You could also do something where each object has a reference to a “Modifications” class. When a property is updated, you tell the class that the column updated is “Name” and its new value is “Fred”, and it stores all the update information away, and when you go to commit the update, it strings it together in proper SQL. That’s probably better than the custom way.


Those are my quick thoughts. What do others think?


 

Comments (15)

  1. Nils says:

    nhibernate – enough said.

    I am not a fan of the DataSet either – it’s performance on large sets of data is terrible, and it has a rather obtuse interface. NHibernate allows for all the persistance goodness to a database with the ease of use of object lists.

  2. amd says:

    large scale? then go stateless.

  3. Ron says:

    Have them edit only one item at at time 😉

  4. Sean Chase says:

    This is the Holy Grail question for most business apps IMO. There’s not a single answer. I personally side with Eric when I have the time. Otherwise, this is why we have products such as CodeSmith, LLBLGen Pro, NHibernate, etc. If I don’t have time to write all of the "tedious" code that’s "close to the metal", then I personally use LLBLGen Pro, CodeSmith templates, or typed-DataSets. There’s a trade-off in every case. All things being equal, my preference is to use the strategy that Eric suggested.

  5. You need an O/R Mapper. That said, this is how I do it…

    My solution is a very simple tracker class (on the client) that uses reflection, i.e. MyTracker.TrackProperty(propertyInfo), or MyTracker.TrackObject(myObject). The tracker is basically a collection of PropertyInfo objects and a collection of corresponding initial values. It can iterate through the PropertyInfo objects and compare current values with initial values to determine what fields are dirty.

    I only use this to determine if there have been any changes, but potentially it could be used to identify and serialize the minimal set of changes across the network.

  6. Ben says:

    If you check out Rocky Lhotka’s CSLA framework, it supports what you need. The architecture allows for a remoting scenario while hiding the remoting details from the implementor. It also does the modification bit that Eric mentioned, so it only sends items that have been changed.

    http://www.lhotka.net/ArticleIndex.aspx?area=CSLA%20.NET

  7. An O/R mapper would be the standard choice in an enterprise class problem like you describe. I know Hibernate, so I’d recommend NHibernate. Tools like that need some initial investment, but it’s worth it after a certain point.

    The way you worded the question gave me a kooky notion though: the SOAP serializer and a GNU ‘diff’ btwn the original and modified. I only suggest ‘diff’ b/c the algorithm is easily obtainable. Offhand, it seems more than a little weird, and you’d spend more CPU and memory on both sides of the wire, but it *does* save bandwidth. :)

  8. Chaz Haws says:

    The advice presented looks good. Here’s one additional thought I had:

    If the database tables are well-factored and narrow and the tables have closer to 10 columns than 300, then sending entire modified rows would still be relatively efficient. Possibly just as space-efficient as anything that had to list explicit column information. I don’t know if it’s achievable in this instance, I’m just throwing out a thought. Cuz I’m not fond of the Dataset either. :)

  9. sanatgersappa says:

    Try DB4O – http://www.db4o.com. You can store complex objects with ease. And it is blazing fast!

  10. You can also try Sooda (http://www.sooda.org/introduction.html). It is a bit simpler than NHibernate but does a pretty good job of incremental persistence. It also features some DLINQ-like strongly typed query language implemented in .NET 1.1 using C# operator overloading.

    All updates are tracked at the field level and you even get application-level triggers.

    Using Sooda you can also easily serialize/deserialize your entire transaction (or rather a changeset) to a simple XML document which can be easily moved across layers and among machines.

  11. Eric Gunnerson was asked a question about object persistance…

    http://blog.onthematter.com/archive/2005/11/16/20.aspx