Why not UpdateObject()?

A question came up in the ADO.Net Technology preview forum recently which made me think that it might be helpful to folks if I were to fill in some background information about how a part of the Entity Framework’s Object Services layer works. The question goes something like this:

I can call AddObject on my context, and I can call DeleteObject, but why isn’t there an UpdateObject or an AttachObject?

Or, put another way:

What’s the most efficient way to send an update to the database if I have an object instance representing the new state of the entity but that object came from somewhere else--that is it was retrieved from a context other than the one I'm going to use to update?

The important thing to realize about this is that the update process needs three key bits of information to perform its task (besides the new values of your entity).

1) The state of each entity.  This is modeled after the same set of states that the DataSet uses per row. That is, an entity can be in an Added, Unchanged, Modified, Deleted or Detached state.

2) The set of modified properties. Naturally this only applies to entities in the modified state (well OK added entities have all of their properties implicitly “modified”). This list of properties is used to compose the update statement sent to the server. Only the values of those properties which have actually been modified are sent in the statement.

3) The original values of the concurrency tokens. In order to handle optimistic concurrency the update process checks original values of properties which have been marked as concurrency tokens, and if those properties have changed on the server, then the update fails and a concurrency exception is thrown. (You can, of course, catch the exception and “refresh” the original values if you decide to overwrite the server data.)

In many cases, the Entity Framework can track this information automatically. It does so by maintaining an ObjectCache associated with the ObjectContext. If you add a brand-new object to the cache, then an EntityKey is computed automatically, the state is set to Added, and all properties are added to the modified list. Because this is a new object there are no original properties to track. If you query an object from the database, then it is automatically added to the cache, the state is set and event handlers are hooked up to track changes and maintain all of the above information.

If, however, you receive an entity object instance which corresponds to an object that already exists in the database through some mechanism other than a query through the context that you will use to update the object (maybe because it was passed into a webservice, for instance), then we have a problem because this new object instance contains only the current values without any of the other three bits of information. If you want to update the database using this object, then you need to somehow “attach” the object, and then explicitly notify the cache of each bit of information. Here’s an example:

         public static void UpdateRoom(Room room)

        {

            DPMudDB db = new DPMudDB();

            // step 1: Add the object to the cache.

            db.AddObject(room);

            // step 2: Get the cache entry and AcceptChanges so

            // that it's no longer in the Added state.

            CacheEntry entry = db.Cache.GetCacheEntry(room.Key);

            entry.AcceptChanges();

            // step 3: Tell the cache which properties are modified.

            // NOTE: This also sets the overall state to Modified.

            entry.SetModified("Name");

            entry.SetModified("Description");

            // step 4: Set the original values of the concurrency

            // tokens. In this case we “refresh” from the

      // server so that the current values will over-

      // write whatever is on the server.

            db.Refresh(RefreshMode.ClientWins, room);

            // step 5: Do the SaveChanges.

            db.SaveChanges();

      }

Side note: The example is based on a pet-project of mine called DPMud where a few friends and I have been working on an old-fashioned mud built with a rich client that uses the entity framework. This provides a fairly straightforward schema which folks can pretty easily relate to (rooms have a many to many relationship with other rooms through an intermediate exit entity, and there are also actors, items, etc.) but which isn’t the all-too-tired customers, orders, line-items example. Since I’ve spent so much time with this schema over the last several months, you’ll probably see it in a number of blog entries.

 

The idea behind this example is that this method takes in an object instance, creates a new context, updates the database and then releases the context—maybe it’s a mid-tier webservice method and the object has been editted in a rich client form or something.

The list of modified properties in step 3 could either be information that was tracked outside of the object, or it could be a list of all the properties if you aren't concerned about the overhead of setting some properties back to the same value they already have in the db if they weren't actually modified.

Setting the original values of the concurrency tokens in step 4 could be handled a number of ways. Beside the approach listed above (force changes by refreshing original values from the DB), another approach is to do nothing. In that case, the cache assumes the current values and the original values are the same. So if nothing has changed in the database since the object was originally queried, everything will work fine. Unfortunately, as of the August CTP there's no way to explicitly set the original values (because you have transported them out of band from the entity or something), but that's something we're considering for potential inclusion in a future release.

One interesting related question we are debating internally is the granularity users would most use when trasporting entities with this additional information. You could imagine, for instanace, serializing a single cache entry in order to transport an entity along with its state, set of modified properties and original values. At the other extreme, you might just serialize an entire object cache which would bring along an arbitrary set of entities and could be an efficient way to transport multiple entities at once (including relationships between them, etc.). In the middle somewhere is the ability to specify a graph of related entities and bring them (and their tracking information), but naturally there are a number of interesting issues that would have to be worked out to enable that scenario (like how to specify what part of the graph to serialize, etc.).

Well… That’s probably more than enough for a first post. Hopefully it will be helpful to some folks. If there are some ways that I’ve not been clear or this sparks other questions, don’t hesitate to ask.

- Danny

P.S. The sample code above is based on a pet-project of mine called DPMud where a few friends and I have been working on an old-fashioned multi-user text adventure built with a rich client that uses the entity framework. This provides a fairly straightforward schema which folks can pretty easily relate to (rooms have a many to many relationship with other rooms through an intermediate exit entity, there are actors, items, etc.) but which isn’t the all-too-tired customers, orders, line-items example. Since I’ve spent so much time with this schema over the last several months, you’ll probably see it in a number of blog entries.