Query me this...

Good intentions aside, it has been more than a week since my last blog entry.  Urrggg.  Fortunately, another post in the ADO.Net Tech Preview Forum has motivated me to get writing again.

The question was essentially: If I create a new entity and add it to my context, why isn’t it returned when I execute a query whose filter criteria should cover it?  The answer as was pointed out by another forum participant is that it won't be returned unless you first call SaveChanges(), but is is a good question because it might be natural to expect that queries would return unsaved objects as well as saved ones—especially when you encounter a second behavior of the entity framework’s object services layer which is not immediately intuitive: identity resolution.

So, let’s take a look at these in a little more detail.  First, some sample code to make this a bit more concrete:

    // create a new item and save it to the database

    Item item1 = new Item();

    item1.Name = "shield";

    item1.MonetaryValue = 5;

    db.AddObject(item1);

    db.SaveChanges();

    // dump the list of items

    foreach (Item item in db.Items.Where("it.MonetaryValue < 10"))

    {

        Console.WriteLine("{0} : {1}", item.Name, item.MonetaryValue);

    }

    // create another new item, add it to the context but don't save

    Item item2 = new Item();

    item2.Name = "sword";

    item2.MonetaryValue = 5;

    db.AddObject(item2);

    // change the first item (it's better than we thought)

    item1.Name = "big metal shield";

    item1.MonetaryValue = 15;

    // dump the list again

    foreach(Item item in db.Items.Where("it.MonetaryValue < 10"))

    {

        Console.WriteLine("{0} : {1}", item.Name, item.MonetaryValue);

    }

The first item list output will be just what you expect (assuming the items entityset was empty before this code runs).  It will just output a single line “shield : 5”.  The second time around dumping the list of items, though, will produce something that you might find a bit surprising.  The result will again be a single line “big metal shield : 15”.

Huh?  The query was the same, but we have added a second item to the context and we have modified the first item.  First of all, the second item which has a monetary value of 5 and therefore should meet the criteria was not returned.  Secondly, the first item was returned but with new values which, by the way, no longer meet the criteria.  You would think that either the query would return the same thing both times because no save has been performed or that the second time the query would return values which match the not-yet-saved state of the context (ie. one line “sword : 5”).

What we have encountered in this example is the combination of two interesting behaviors of the Entity Framework: 

1. All query evaluation is performed by the target database.  Always delegating query evaluation to the target database means that new entities which have not yet been saved will never be returned by queries.  So that’s why the second item list still only has one item.

2. Query<T> performs identity resolution before returning object instances.   Query<T> performs much of its task by delegting first to the map provider to actually execute the query and then to the ObjectMaterializer to produce the object instances.  This two step process is interesting because the first step is performed without any reference to the context, and the second step is performed without any reference to the original query specified. 

The map provider works with the underlying store provider to produce a query against the target DB which is solely responsible for determining which logical entities make up the resultset.  It returns those entities in the form of a DataReader.  The materializer, in turn, takes that DataReader and works with the context to find and return already existing object instances whereever possible and to create new object instances (and cache a reference to them) when necessary.  By default, if the context already has an instance of a particular entity (as determined by comparing EntityKeys), that instance is returned by the materializer without making any changes to it.  This means that the object returned, as in this example, may not even meet the original query criteria because it may have been updated in the context but those updates may not yet have been persisted to the target DB. 

While these behaviors may not seem intuitive at first, after long debate we have come to the conclusion that they are the best compromise between a variety of competing requirements.  We do realize, though, that there are some cases where you may need somewhat different behavior—at least with regard to identity resolution.   These scenarios are accommodated through Query<T>’s MergeOption property.  A full discussion of MergeOptions will have to wait for another blog entry, but for now suffice it to say that the default behavior described above (called AppendOnly) can be overridden to provide no identity resolution or to cause local entities which are found to be updated according to a few different schemes (usually used for conflict resolution).

- Danny

Side note: Point #1 above is not strictly true.  There is one case where the Entity Framework does something like a query on the local context rather than delegating to the target DB: GetObjectByKey first checks for an object with the specified key in the context and only executes a query if the object isn’t found.

Side note #2: There’s no current plan for local evaluation of queries other than LINQ’s ability to perform query-like operations over in-memory structures.  Even that doesn’t really apply to the context since the context doesn’t directly expose its contents.