Query me this…


Good intentions aside, it has been more than a week since my last blog entry.  Urrggg.  Fortunately, another post in the ADO.Net Tech Preview Forum has motivated me to get writing again.


The question was essentially: If I create a new entity and add it to my context, why isn’t it returned when I execute a query whose filter criteria should cover it?  The answer as was pointed out by another forum participant is that it won’t be returned unless you first call SaveChanges(), but is is a good question because it might be natural to expect that queries would return unsaved objects as well as saved ones—especially when you encounter a second behavior of the entity framework’s object services layer which is not immediately intuitive: identity resolution.


So, let’s take a look at these in a little more detail.  First, some sample code to make this a bit more concrete:


    // create a new item and save it to the database


    Item item1 = new Item();


    item1.Name = “shield”;


    item1.MonetaryValue = 5;


    db.AddObject(item1);


    db.SaveChanges();


 


    // dump the list of items


    foreach (Item item in db.Items.Where(“it.MonetaryValue < 10”))


    {


        Console.WriteLine(“{0} : {1}”, item.Name, item.MonetaryValue);


    }


 


    // create another new item, add it to the context but don’t save


    Item item2 = new Item();


    item2.Name = “sword”;


    item2.MonetaryValue = 5;


    db.AddObject(item2);


 


    // change the first item (it’s better than we thought)


    item1.Name = “big metal shield”;


    item1.MonetaryValue = 15;


 


    // dump the list again


    foreach(Item item in db.Items.Where(“it.MonetaryValue < 10”))


    {


        Console.WriteLine(“{0} : {1}”, item.Name, item.MonetaryValue);


    }


The first item list output will be just what you expect (assuming the items entityset was empty before this code runs).  It will just output a single line “shield : 5”.  The second time around dumping the list of items, though, will produce something that you might find a bit surprising.  The result will again be a single line “big metal shield : 15”.


Huh?  The query was the same, but we have added a second item to the context and we have modified the first item.  First of all, the second item which has a monetary value of 5 and therefore should meet the criteria was not returned.  Secondly, the first item was returned but with new values which, by the way, no longer meet the criteria.  You would think that either the query would return the same thing both times because no save has been performed or that the second time the query would return values which match the not-yet-saved state of the context (ie. one line “sword : 5”).


What we have encountered in this example is the combination of two interesting behaviors of the Entity Framework: 


1.      All query evaluation is performed by the target database.  Always delegating query evaluation to the target database means that new entities which have not yet been saved will never be returned by queries.  So that’s why the second item list still only has one item.


2.      Query<T> performs identity resolution before returning object instances.  Query<T> performs much of its task by delegting first to the map provider to actually execute the query and then to the ObjectMaterializer to produce the object instances.  This two step process is interesting because the first step is performed without any reference to the context, and the second step is performed without any reference to the original query specified. 

The map provider works with the underlying store provider to produce a query against the target DB which is solely responsible for determining which logical entities make up the resultset.  It returns those entities in the form of a DataReader.  The materializer, in turn, takes that DataReader and works with the context to find and return already existing object instances whereever possible and to create new object instances (and cache a reference to them) when necessary.  By default, if the context already has an instance of a particular entity (as determined by comparing EntityKeys), that instance is returned by the materializer without making any changes to it.  This means that the object returned, as in this example, may not even meet the original query criteria because it may have been updated in the context but those updates may not yet have been persisted to the target DB. 


While these behaviors may not seem intuitive at first, after long debate we have come to the conclusion that they are the best compromise between a variety of competing requirements.  We do realize, though, that there are some cases where you may need somewhat different behavior—at least with regard to identity resolution.   These scenarios are accommodated through Query<T>’s MergeOption property.  A full discussion of MergeOptions will have to wait for another blog entry, but for now suffice it to say that the default behavior described above (called AppendOnly) can be overridden to provide no identity resolution or to cause local entities which are found to be updated according to a few different schemes (usually used for conflict resolution).


– Danny


Side note: Point #1 above is not strictly true.  There is one case where the Entity Framework does something like a query on the local context rather than delegating to the target DB: GetObjectByKey first checks for an object with the specified key in the context and only executes a query if the object isn’t found.


Side note #2: There’s no current plan for local evaluation of queries other than LINQ’s ability to perform query-like operations over in-memory structures.  Even that doesn’t really apply to the context since the context doesn’t directly expose its contents.

Comments (5)

  1. jkowalski says:

    Wouldn’t it be possible to save dirty objects to the database automatically before running the query?

    I’m the author of Sooda – simple O/R mapper for .NET (http://www.sooda.org) and it supports a concept of "precommit" in this case.

    When executing a query, Sooda knows the tables/classes that the query accesses/depends on (by analyzing the WHERE clause) and issues a pre-commit of all unsaved objects of these classes. This way newly inserted objects will be properly returned by the upcoming SELECT statement. Final commit operation changes from INSERT to UPDATE on precommitted objects. Each Sooda transaction (which is equivalent to a context) is associated with an open database transaction, so partially written transactions can be cancelled if needed.

    This approach has some drawbacks and complexities (like how to deal with partially constructed objects which cannot be committed because of NULLs), but I believe it is a step towards making queries really easy to use and predictable.

    Would it be possible to have this implemented in EDM?

    BTW. I’ll be joining Microsoft in October to work for ADO.NET vNext Team as SDET. Now that you’ve published the CTP, I can’t wait to meet you all and play with the code as we’ll be obviously working on a breakthrough technology.

  2. MatHobbs says:

    Also, Hibernate has a FlushMode (see

    http://www.hibernate.org/hib_docs/v3/api/org/hibernate/FlushMode.html) which provides some control over the ‘precommit’/flush process when querying.

    Cheers,

    -Matthew Hobbs

  3. dsimmons@microsoft.com says:

    We don’t really view the context as being the same as a transaction.  For that we have System.Transactions, and that is in fact a good strategy for queries that need to include not-yet-committed changes.  In fact, more discussion on the original thread that motivated me to write the above blog entry resulted in this comment that I made to that thread back on the forum (copied here because I don’t think I can link directly to a comment in a thread on the forum–only to the entire thread):

    An unrelated discussion we were having today made me think of this thread, so I wanted to do a quick follow-up…

    There is one way we didn’t talk about to query uncommitted changes: Using System.Transactions.  The support in this CTP is not the full thing we intend it to be before we release, but you can get it to work in some controlled scenarios today.  For example (copied from an email Pablo posted internally today, so hopefully he won’t mind my plagarism):

    using(TransactionScope tx = new TransactionScope()) {

      using(ObjectContext ctx = new ObjectContext(…)) {

        var q = query…

        // …make some changes on the objects returned by “q”

        //…save changes without committing

        ctx.SaveChanges();

        //…now do more queries, they’ll include the changes you flushed

        //…now commit for real, or abort if don’t want the changes to be permanent

        tx.Complete();

     }

    }

    Please tread lightly in this area with this CTP, though.  There are definitely some pitfalls, but this should give you some other ideas to think about this topic in general.

    – Danny

  4. Ralph's Blog says:

    After an entity is created and added to the context it can not be queried unless it is saved to the database