Transparent Caching Support in the Entity Framework

Article
07/09/2008

The Entity Framework's provider model makes it possible for it to work over different database's.

The idea being that someone, either the database vendor or a third party, can write a provider that allows the Entity Framework to work with that database.

The Entity Framework asks the provider to convert queries and update operations from a canonical, database agnostic form, into native commands. I.e. T-SQL for SQLServer.

This Provider model however has an interesting side-effect. It makes it possible to write wrapping providers, providers that wrap another provider, layering in additional services.

Examples of possible wrapping providers include: logging providers, auditing providers, security providers, optimizing providers and caching providers. The latter is the subject of the following one-pager put together by Jarek

----

Introduction

Business applications use various kinds of Reference Data that does not change at all during the lifetime of an application or changes very infrequently. Examples may include: countries, cities, regions, product categories, etc.

Applications that present data (such as ASP.NET applications) tend to run the same or similar queries for reference data very often resulting in a significant database load. Developers typically use ASP.NET cache and/or custom caching techniques to minimize number of queries but implementing caching manually adds additional complexity and maintenance cost to an existing solution.

Entity Framework can be extended to handle data caching in a transparent way, so that any application using it can take advantage of caching with little or no modification. In Entity Framework V1 it is possible to implement transparent caching using a custom provider as demonstrated in EFCachingProvider sample (TBD). We are considering adding caching as a first-class concept in Entity Framework V2 so that it will no longer be necessary to use a wrapper provider approach.

Requirements

We are designing a query caching layer for Entity Framework that:

· Will be transparent (existing code will automatically take advantage of caching without modification other than defining caching policy).

· Will cache query results, not entity objects so arbitrary ESQL queries can also benefit from it

· Will be optimized for read-only or mostly-read-only data. Caching of frequently changing data will also be supported, but we are not optimizing for that scenario.

· Will handle cache eviction automatically on updates.

· Will be extensible (it should be easy to use with ASP.NET cache or ^3rd party caching solutions including local and distributed caches)

Implementation

To implement query caching we need to be able to intercept query and update execution.

All queries in Entity Framework (regardless of their origin: Entity SQL queries, Object Query<T>, LINQ queries or internal queries generated by object layer) are processed in the Query Pipeline which at some point passes Canonical Query Tree (CQT) to the provider to get the result set of a query. We will cache query results in such a way that when the same query is used over and over again (as determined by the CQT and parameter values), the results will be assembled from the cache instead of a database.

Updates are also centralized in Entity Framework (Update Pipeline) and handled in a similar way. At some point update commands (Update CQTs) are sent to the provider. We can add cache maintenance routines at this point that ensure that proper cache invalidation happens each time an update occurs.

Cache entries and dependencies

Query results stored in the cache will be represented by opaque data structures, which are immutable and serializable, so that they can be easily passed over the wire. An example of such structure may be:

[Serializable]
public class DbQueryResults
{
public List<object[]> Rows = new List<object[]>();
}

When caching query results care must be taken to make sure that returned data is not stale. To be able to detect that, we associate a list of dependencies with each query result. Whenever any of the dependencies change, the query results should be evicted from cache. In the proposed approach, dependencies will be simply store-level entity set names (tables or views) that are used in the query.

For example:

· SELECT c.Id FROM Customers AS c is dependent on “Customers” entity sets

· SELECT c.Id, c.Orders FROM Customers as c is dependent on Customers and Orders entity sets

· 1+2 is not dependent on any entity sets

When adding items to the cache, we will be passing a list of dependent entity sets to the cache provider. After EF makes changes to the database, it will notify the cache about list of entity sets that have changed. All query results relying on any of those entity sets have to be removed from the cache. Dependency names will be represented as strings and collections of dependent entity sets will be IEnumerable<string>.

In the first implementation we will likely use query text or some derivative of it (such as cryptographic hash) as a cache key, but cache implementations should not rely upon cache keys being meaningful.

Interface

To be able to work with EF, cache must implement the following interface:

public interface ICache
{
    bool TryGetEntryByKey(string key, out object value);
    void Add(string key, object value,
             IEnumerable<string> dependentEntitySets);
    void Invalidate(IEnumerable<string> entitySets);
}

As you can see, the values are passed as objects instead of DbQueryResults.

· TryGetEntryByKey(key,out value) tries to get cache entry for a given key. If the entry is found it is returned in queryResults and the function returns true. If the entry is not found, the entry returns false and value of queryResults is not determined.

· Add (key, value, dependentEntitySets) adds the specified query results to the cache with and sets up dependencies on given entity sets.

· Invalidate(sets) – will be invoked after changes have been committed to the database. “sets” is a list of sets that have been modified. The cache should evict ALL queries whose dependencies include ANY of the modified sets.

Cache providers will typically define some specific retention policies, limits and automatic eviction policies (using LRU, LFU or some other criteria). ICache interface does not define that. Based on the user feedback we may want to extend ICache to specify parameters such as retention timeout for each item or add a new interface for configuring cache behavior in an abstract way.

Using the interface defined above, the pseudo-C#-code implementation of query execution operation may look like this:

DbQueryResults GetResultsFromCache(DbCommandDefinition query)
{
    if (CanBeCached(query))
{
        // calculate cache key
        string cacheKey = GetCacheKey(query);
        DbQueryResults results;
        // try to look up the results in the cache
        if (!Cache.TryGetEntryByKey(cacheKey, out results))
{
            // results not found in the cache - perform a database query
            results = GetResultsFromDatabase(query);
            // add results to the cache
            Cache.Add(cacheKey, results, GetDependentEntitySets(query));
        }
        return results;
    }
else
{
        return GetResultsFromDatabase(query);
    }
}

----

As always we are keen to hear your comments.

Alex James
Program Manager,
Entity Framework Team

This post is part of the transparent design exercise in the Entity Framework Team. To understand how it works and how your feedback will be used please look at this post .