Rolling your own SQL Update on-top of the Entity Framework - Part 1

One of the current limitations of the Entity Framework, is that in order to modify an entity you have to first bring it into memory. Now in most scenarios this is just fine. There are however some bulk update cases where performance suffers somewhat.

For example imagine if you need to move all unshipped orders on a particular ship, to another ship, because of repairs or something. Using the entity framework your only option is to issue a query that pulls back all the matching orders and update them all in memory before finally calling SaveChanges() once done.

Now that could be a whole lot of Orders, which seems somewhat wasteful considering you could do this in T-SQL with one very simple update statement, thereby bypassing all the query, hydration and cross process shipping costs.

But wait...

Now I said "your only option", what I should have said is "your only option out of the box". If you remember I hinted in my last post that there was a work around, using ToTraceString() and Extension Methods. So without further ado, lets drill into how you might do that.

The idea is you simply express a query as per normal. And then use that query as the where part of an update statement. I.e. something like this:

var query = from o in ctx.Orders
        where o.ShipName == "Titanic" && o.ShippedDate == null
        select o;

int movedShip = query.Update(o => new Order(){ShipName = "Queen Mary"});

Which means we are looking for unshipped orders scheduled to be shipped on the Titanic, and then we are updating them all so that they will be shipped on the Queen Mary instead.

Now in order to make this more general we need to create a generic extension method something like this:

public static int Update<T>(
this IQueryable<T> queryable, Expression<Func<T,T>> updator) where T: EntityObject, new()
{

}

There are a lot of smarts in this declaration:

  1. There is a constraint on T. It has to be an EntityObject, i.e. it can't be a projection or anonymous type.
  2. Notice that the second parameter is Expression<Func<T,T>> which means an expression that captures a function that takes a EntityObject of type T and returns an EntityObject of the same type, and because it is an expression, we can crack open its intent and turn it into something we can execute in the database.
  3. We return int which will allow us to return the number of rows / entities effected.

The constraint on T is there because it ensures we only allow updates to whole entities. We don't want to allow attempts to update projections that might span joins etc, and from a more practical stand point it means that the standard entity framework query, accessed via ToTraceString() , is going to be returning all the columns in an entity, which means we can modify it to include just the keys and use the query in a T-SQL update of this form:

UPDATE [Entity] SET ... WHERE key IN (SELECT Key...)

Here the inner Query is based on a modified version of the query provided by ToTraceString() .

Now (2) is a little bit trickier to understand. The idea is that we will return a new Entity with the values we want to update set in the constructor, the reason why we also need to have the original is so we can capture attempts to use the original value in the new value assignment.

For example if was simply an Expression<Func<T>> we would have no way of expressing something like this:

UPDATE [Entity] SET VALUE = VALUE - DISCOUNT WHERE ....

Whereas by using Expression<Func<T,T>> we have 'a marker' for the original values so we can express that like this:

query.Update(entity => new Entity(){Value = entity.Value - Discount});

Whew...

I think that is enough for one blog post, and more importantly I have a meeting to go to...

So I'll continue this in part 2, where I will flesh out the implementation details of our Update(..) method...