LINQ to SQL Tips 9: Understanding DataContext's internal caching

Back to the "tips" series after a little break ...

One common question I get is about caching of data in LINQ to SQL. It means a lot of different things so let's start with the following baseline: LINQ to SQL was designed to get you objects from the database and to submit the changes back to the database. It is not intended to be a mid-tier caching component. It is not a cache and it does not pretend to be one. However, it does have certain behaviors that are interesting in this regard:

Object identity

If you want to be able to update the objects you retrieve from the database, it helps to avoid multiple copies of the objects. LINQ to SQL maintains an internal "identity cache" to ensure that within the scope of a DataContext, there will be at most one instance of a given type for a given key value. Think of it like a dictionary indexed by the key value (whether single column or composite key) that stores entities. When you enumerate the results of a query, DataContext looks at each row returned by the underlying DataReader, checks if an instance with the same key value already exists in the dictionary. If an entity is found, it throws away the row and just returns you the existing object from the identity cache. (For those who are more curious, you can look at DataContext.cs, IdentityManager.cs in the sources). As described in another post, we don't stomp on your copy while materializing objects (unless you ask for that using Refresh())

Change tracking

After query execution, you are out of the relational world and into the brave world of objects. There the currency for identity is an object reference rather than a key value since APIs take in objects and not rows or key values. So we also maintain a dictionary that can be looked up by the object. Among many other things, it helps us Attach() a new object by starting to track it.

Enough of the internal view. What does it mean for you?

  1. LINQ to SQL is designed for retrieving and saving objects. Any caching it does is to that end and not for general purpose use you may have. There are other cache components you can look at.
  2. By and large, DataContext holds a reference to an entity that it materializes for the life of the DataContext and keeps it as is unless you explicitly modify it or Refresh() it.
    1. You can ask for it by the key value and retrieve it without remote query (this bug is getting fixed in SP1)
    2. DataContext feels like it has an obsessive possessive disorder. You can't really take away an object from it. It would rather be disposed than give up the precious entities it holds on to. If you don't like the fact that a reference is maitained (i.e. you want to have entities garbage collected), then use the disposable pattern. See this post for more information.
  3. Over time, the entities in the id cache get more and more stale. Fine for reference data; not OK for data that changes in the database more often.
  4. The internal caches make DataContext a write-back cache rather than a write-through cache. We write back on SubmitChanges().
  5. Attach() might look like an Add() on the cache (and it does behave like it in some ways), but it is designed for tracking the object for further changes.
  6. Non-entities don't show up in the internal cache - there is no key for lookup. Most common case of non-entity is a result from a sproc that does not match the shape of an entity.
  7. Turning ObjectTrackingEnabled off can give you a small perf boost - may be useful in a read-only app (only).

Dinesh