Transparent Lazy Loading for Entity Framework – part 3 – Anatomy of a Stub

This post is a part of the series that describes EFLazyLoading library.

In two previous articles I have introduced EFLazyLoading – a framework for lazy loading of entities on top of Entity Framework. In this post I will explain what stubs are and how they work.

Let’s establish some terminology first:

  • Shell object is a public object that the users interact with. It has the properties of an entity, but no backing fields except for the primary key.
  • Data object is an internal data structure that has backing fields for the object. It implements ILazyEntityDataObject interface.
  • Stub object is a shell object that has no data object attached to it.
  • Fully loaded object is a shell that has a data object attached and populated.

Here is a typical pair of shell and data objects – NorthwindEF.Category. Note a few things:Structure of the Category entity

  • _CategoryID is the only field in the shell class (disregard the base class for a while). All other fields are declared in the data class
  • The only public properties on the shell class are properties that correspond to EntityType definition.
  • Data class is a nested type inside the shell class.
  • Data class is a backing store for all non-key properties.
  • Data objects must be able to deep-clone themselves. This is the purpose of ILazyEntityDataObject.Copy() method.
  • Data properties (see previous article) are also declared in the Data class. This is because of pure convenience as it enables fields to be private.
  • There are no public methods – only protected CreateDataObject() which takes care of creating a private data object.

Shell objects implement ILazyEntityObject interface in addition to three IPOCO interfaces: IEntityWithKey, IEntityWithChangeTracking and IEntityWithRelationships. In current implementation those interfaces are implemented in the base class called LazyEntityObject.

Data object in current implementation it is implemented as a class with fields, but in theory it could be implemented as a hash table (to allow for types with huge number of nullable columns that are often nulls) or in some other way.

How stubs are born

Stubs can come to life in four possible ways:

  1. Relationship navigation – when navigating a many-to-one relationship, a stub object is created to represent the related end (if the entity is not already being tracked by the ObjectStateManager).
  2. IQueryable<T>.AsStubs(). It is possible to construct a sequence of stubs (IEnumerable<T>) by calling AsStubs() on IQueryable<T>. This will convert a query to a query that only projects primary keys (thus saving on database connection bandwidth).
  3. IQueryable<T>.GetStub() that returns a single result. This is a stub equivalent of calling First().
  4. It is also possible to populate LazyEntityCollection with stub objects (instead of fully loaded objects) by calling LoadStubs() method.

There is also a way for unmodified fully loaded object to become stub again. All you have to do is to discard their data object by calling LazyObjectContext.Reset() or by calling LazyObjectContext.ResetAllUnchangedObjects() which does the same thing for all unmodified objects in the context. This can help reduce memory footprint of your unit of work, when you are dealing with large objects and you are done processing them. Instead of detaching an object from the context, you simply discard its data – object identity is preserved and it can still be found in all relationships it belongs to, but most of objects memory can be reclaimed by GC.

Examples

 // instantiate a fully loaded entity
var prod = entities.Products.First();

// stub gets created because of relationship navigation - no load from the database here
var cat = prod.Category; 

// category object gets fully loaded on first property access
Console.WriteLine("name: {0}", cat.CategoryName);

// once it is loaded we can access all properties - no database access here
Console.WriteLine("desc: {0}", cat.Description);

// iterate through details
// note that collection is populated with LoadStubs which only brings keys
// into memory
foreach (OrderDetail det in prod.OrderDetails.LoadStubs())
{
    // order can be Order or InternationalOrder so it will be eagerly loaded
    // because we don't know the concrete type (see below)
    // next time (even in a different ObjectContext) we'll use cached type information
    // so there's no server roundtrip
    var order = det.Order;

    Console.WriteLine("{0} {1}", det.Product.ProductName, order.OrderDate);
}

// execute a query and return collection of stub objects
var stubs = entities.Suppliers.Where(c => c.Products.Any(d=>d.Category.CategoryID == cat.CategoryID)).AsStubs();

// iterate over stubs - as we go through the collection, individual suppliers are loaded on-demand
// note how LoadStubs() is used to count Products without fully loading them
foreach (var p in stubs)
{
    Console.WriteLine("Shipper {0} - {1} - {2} products", p.CompanyName, p.Phone, p.Products.LoadStubs().Count);
}

// execute a query that returns a single stub object
var singleStub = entities.Suppliers.GetStub(c=>c.SupplierID == 4);
Console.WriteLine("Stub: {0}", singleStub.Phone);

Problem with polymorphic types

Despite our intention, we sometimes get fully loaded objects instead of stubs when calling one of the above methods – that is because of polymorphic types. For example, when your schema has a Customer base type and InternationalCustomer type derived from and there is an association from Order to Customer, you can get either a customer or international customer when you navigate the association:

We cannot possibly know the concrete type up front by examining its EntityKey. Unfortunately to create a stub/shell object we need to know the CLR type. When doing eager load, Entity Framework materializer takes care of determining concrete type by sending a specially crafted SQL query down to the server. The query includes a special discriminator column down which is used to resolve back to concrete type. Unfortunately in this case we don’t want to send any query. Even if we wanted to do that, neither Entity SQL nor LINQ have a way to project object type without loading full object, so store cannot really help us here.

Enter IObjectTypeCache

IObjectTypeCache is one proposed solution to this problem. It exploits the fact, that (using normal methods) objects never change their type – there is no way to change the class of an entity stored in a database table, because Entity Framework does not allow inheritance discriminator columns (in TPH mapping) to be written to and there is no way to achieve the same thing in case of TPT or TPC mappings.

IObjectTypeCache s a cache whose keys are EntityKey objects and values are CLR types (in fact they are factory methods that return objects). This gives us amortized low cost of determining the type given a CLR type.

Every time we create a stub (of type T), we check whether the EntityType has subclasses (defined in CSDL). If the type is known to not to be polymorphic, we just create a new instance of T.

If the type can have subclasses, we check whether the mapping from EntityKey to type is found in cache – if it is there – we just call the factory method and the stub is ready.

If the mapping is not found in the cache (which typically happens the first time a particular EntityKey is materialized in an application), we don’t try to create stubs at all – we fall back to running the fully materialized query which resolves the type for us. After this is done, we add newly discovered key-to-type mappings to our cache, so that the mappings are known next time.

There is a singleton object that holds a reference to IObjectTypeCache that all LazyObjectContexts will use – it is currently held in a static property of LazyObjectContext called ObjectTypeCache.

WARNING: The default implementation of IObjectTypeCache (as of EFLazyLoading v0.5) does not do any cache eviction. This is typically not a problem for databases that have about one million of polymorphic objects (cache can grow up to consume 30-50MB of RAM which is usually not a problem nowadays). If your application has to scale to support more polymorphic objects than that, the sample has to be modified to add some automatic eviction (based on LRU, LFU or other strategy).