Transparent Caching Support in the Entity Framework


The Entity Framework’s provider model makes it possible for it to work over different database’s.


The idea being that someone, either the database vendor or a third party, can write a provider that allows the Entity Framework to work with that database.


The Entity Framework asks the provider to convert queries and update operations from a canonical, database agnostic form, into native commands. I.e. T-SQL for SQLServer.


This Provider model however has an interesting side-effect. It makes it possible to write wrapping providers, providers that wrap another provider, layering in additional services.


Examples of possible wrapping providers include: logging providers, auditing providers, security providers, optimizing providers and caching providers. The latter is the subject of the following one-pager put together by Jarek


—-


Introduction


Business applications use various kinds of Reference Data that does not change at all during the lifetime of an application or changes very infrequently. Examples may include: countries, cities, regions, product categories, etc.


Applications that present data (such as ASP.NET applications) tend to run the same or similar queries for reference data very often resulting in a significant database load. Developers typically use ASP.NET cache and/or custom caching techniques to minimize number of queries but implementing caching manually adds additional complexity and maintenance cost to an existing solution.


Entity Framework can be extended to handle data caching in a transparent way, so that any application using it can take advantage of caching with little or no modification. In Entity Framework V1 it is possible to implement transparent caching using a custom provider as demonstrated in EFCachingProvider sample (TBD). We are considering adding caching as a first-class concept in Entity Framework V2 so that it will no longer be necessary to use a wrapper provider approach.


Requirements


We are designing a query caching layer for Entity Framework that:


·         Will be transparent (existing code will automatically take advantage of caching without modification other than defining caching policy).


·         Will cache query results, not entity objects so arbitrary ESQL queries can also benefit from it


·         Will be optimized for read-only or mostly-read-only data. Caching of frequently changing data will also be supported, but we are not optimizing for that scenario.


·         Will handle cache eviction automatically on updates.


·         Will be extensible (it should be easy to use with ASP.NET cache or 3rd party caching solutions including local and distributed caches)


Implementation


To implement query caching we need to be able to intercept query and update execution.


All queries in Entity Framework (regardless of their origin: Entity SQL queries, Object Query<T>, LINQ queries or internal queries generated by object layer) are processed in the Query Pipeline which at some point passes Canonical Query Tree (CQT) to the provider to get the result set of a query. We will cache query results in such a way that when the same query is used over and over again (as determined by the CQT and parameter values), the results will be assembled from the cache instead of a database.


Updates are also centralized in Entity Framework (Update Pipeline) and handled in a similar way. At some point update commands (Update CQTs) are sent to the provider. We can add cache maintenance routines at this point that ensure that proper cache invalidation happens each time an update occurs.


Cache entries and dependencies


Query results stored in the cache will be represented by opaque data structures, which are immutable and serializable, so that they can be easily passed over the wire.  An example of such structure may be:


[Serializable]
public class DbQueryResults
{
    public List<object[]> Rows = new List<object[]>();
}

When caching query results care must be taken to make sure that returned data is not stale. To be able to detect that, we associate a list of dependencies with each query result. Whenever any of the dependencies change, the query results should be evicted from cache. In the proposed approach, dependencies will be simply store-level entity set names (tables or views) that are used in the query.


For example:


·         SELECT c.Id FROM Customers AS c is dependent on “Customers” entity sets


·         SELECT c.Id, c.Orders FROM Customers as c is dependent on Customers and Orders entity sets


·         1+2 is not dependent on any entity sets


When adding items to the cache, we will be passing a list of dependent entity sets to the cache provider.  After EF makes changes to the database, it will notify the cache about list of entity sets that have changed. All query results relying on any of those entity sets have to be removed from the cache. Dependency names will be represented as strings and collections of dependent entity sets will be IEnumerable<string>.


In the first implementation we will likely use query text or some derivative of it (such as cryptographic hash) as a cache key, but cache implementations should not rely upon cache keys being meaningful.


Interface


To be able to work with EF, cache must implement the following interface:


public interface ICache
{
    bool TryGetEntryByKey(string key, out object value);
    void Add(string key, object value, 
             IEnumerable<string> dependentEntitySets);
    void Invalidate(IEnumerable<string> entitySets);
}

 


As you can see, the values are passed as objects instead of DbQueryResults.

·         TryGetEntryByKey(key,out value)  tries to get cache entry for a given key. If the entry is found it is returned in queryResults and the function returns true. If the entry is not found, the entry returns false and value of queryResults is not determined.


·         Add (key, value, dependentEntitySets) adds the specified query results to the cache with and sets up dependencies on given entity sets.


·         Invalidate(sets) – will be invoked after changes have been committed to the database. “sets” is a list of sets that have been modified. The cache should evict ALL queries whose dependencies include ANY of the modified sets.


Cache providers will typically define some specific retention policies, limits and automatic eviction policies (using LRU, LFU or some other criteria).  ICache interface does not define that. Based on the user feedback we may want to extend ICache to specify parameters such as retention timeout for each item or add a new interface for configuring cache behavior in an abstract way.


Using the interface defined above, the pseudo-C#-code implementation of query execution operation may look like this:


DbQueryResults GetResultsFromCache(DbCommandDefinition query)
{
    if (CanBeCached(query))
    {
        // calculate cache key
        string cacheKey = GetCacheKey(query);
        DbQueryResults results;
        // try to look up the results in the cache
        if (!Cache.TryGetEntryByKey(cacheKey, out results)) 
        {
            // results not found in the cache – perform a database query
            results = GetResultsFromDatabase(query);
            // add results to the cache
            Cache.Add(cacheKey, results, GetDependentEntitySets(query));
        }
        return results;
    }
    else
    {
        return GetResultsFromDatabase(query);
    }
}

 


—-


As always we are keen to hear your comments.


Alex James
Program Manager,
Entity Framework Team


This post is part of the transparent design exercise in the Entity Framework Team. To understand how it works and how your feedback will be used please look at this post.

Comments (14)

  1. grahamdyson@hotmail.com says:

    I think adding caching support to the EF would be very beneficial.

    However, if your only implementation of cache invalidation is  based entirely within the framework then it won’t be able to cope with multiple servers fulfilling requests to the same database or changes to the database which are made without using the EF.

  2. omario says:

    I  think  cahing is not responsibility of data access layer.

  3. jkowalski says:

    @grahamdyson:

    There is nothing that prevents cache implementation from evicting more cache entries than Entity Framework asks it to. Cache can use approach similar to SqlCacheDependency in ASP.NET and basically set up a watcher over a specific SQL table or query and do the eviction whenever data in that table changes. This would take care of 3rd parties modifying the database. EF will never rely on a particular entry being there in the cache, even right after it gets added.

    Regarding the other part of your question:

    Entity Framework will not try to solve the problem of synchronizing caches amongst nodes of a cluster, but there are distributed middle-tier caching products  that ensure proper cache consistency across multiple machines. All you need to do is to provide ICache implementation for a particular product.

    We believe that the proposed ICache interface can be used with a range of caching approaches, such as:

    – eternal in-memory cache without invalidation (for purely-read-only objects, this can be as simple as IDictionary<string,object>)

    – in-memory cache with invalidation

    – distributed caches

    – on-disk caches

    – using SQL Server CE or other lightweight/embedded database

    Jarek Kowalski [MSFT]

  4. We’ve just posted a one-pager on transparent caching support in Entity Framework on our EFDesign blog

  5. KooKiz says:

    It could be really powerful if someone wrote an implementation to use it with Velocity. A transparent database cache handler usable from a cluster. It would make things soooo much easier 🙂

  6. grahamdyson@hotmail.com says:

    Thanks for the reply Jarek.

    I like the idea of being able choose to use a SqlCacheDependency like implementation, or even how about the possibility of optionally using SQL 2008 Change Tracking like they’re doing with the Sync Framework?

    Do you see the ICache interface being used for client-side caching, as your post seems to be focused on caching on the server-side?

    From the projects I’ve worked on I find the benefits of client-side caching to increase the users perceived responsiveness of the system the most.

  7. GregYoung says:

    I believe there is a violation of SOC here.

    EF should be setting the key (not a hash of it). The reasoning for this is 2-fold.

    1) How keys get hashed are a concern of the caching mechanism not of the EF

    2) If there is a collision how would the caching mechanism detect it?

    On a side note @KoolKiz writing an abstraction for velocity/memcached would be a fair trivial exercise

    Cheers,

    Greg

  8. GregYoung says:

    btw: previous comment was in reply to:

    "In the first implementation we will likely use query text or some derivative of it (such as cryptographic hash) as a cache key, but cache implementations should not rely upon cache keys being meaningful."

  9. Am says:

    One question regarding provider model itself rather then the caching:

    – is there any way to fill entities using query expressed in dialect native to the underlying DBMS? Otherwise the "canonical, database agnostic" form would prevent anyone from using features like Oracle’s "connect by" etc, or some performance tuning otherwise impossible effectively becoming lowest common denominator.

    Of course it should be a matter of last resort to break through the abstraction layer, but having a possibility to always fall back to something proven (functionally or performance wise) would be crucial for people resposible for choosing data layer solution in enterprise context.

    Greetings

    Am.

    PS. I know that it can be achieved using stored procedures, but it would be hard to mantain in some cases.

  10. jkowalski says:

    Take a look at EFExtensions sample at http://code.msdn.microsoft.com/EFExtensions.

    It has a way to materialize arbitrary objects based on database query.

  11. Tanveer Badar says:

    Something like:

    public class CacheInvalidatedEventArgs<CacheKey> : EventArgs

    {

    public CacheInvalidatedEventArgs( CacheKey key , object oldValue )

    { /* blah blah blah */ }

    // more blah blah blah

    }

    public delegate void CacheInvalidatedEventHandler<CacheKey>( object sender , CacheInvalidatedEventArgs<CacheKey> arg );

    and the addition of

    public event CacheInvalidatedEventHandler ItemInvalidated;

    to the interface ICache will be a nice addition. I admit, propagating the generic type to messes up things but its just an idea. Some variation is welcome.

  12. Iqbal Khan says:

    Alex,

    Good article. In my opinion, a better place to add caching support is where you’re creating entity objects. You guys should add a caching provider model separate from databaes provider. The caching provider is then called to cache entity objects or their collections instead of datasets. This way, those same entity objects can be directly fetched by applications if they knew the key-format. And, you can also handle "cache dependencies" more accurately becuase you’re aware of relationships.

    In fact, you should allow people to specify caching attributes in your "mapping" so people can associate relationships etc.

    Although, the current provider model give us "a place" to adding caching, this model was not intended for caching and is therefore not ideal. I would urge you to consider adding a caching provider model separately.

    We have a distributed caching product called NCache and we’ll be integrating NCache with EF right now at db-provider level but would ideally want to add it at a level above to make it more effective.

    Cheers,

    Iqbal Khan

    Alachisoft

    http://www.alachisoft.com

    NCache: Distributed Cache & ASP.NET Sessions

  13. Rajeshmr2000 says:

    Can u tel us approximate release date of this cache library in EF