Transparent Lazy Loading for Entity Framework – part 2

This post is a part of the series that describes EFLazyLoading library.

As I promised last time, I would like to present the result of a little experiment in implementing transparent lazy loading for Entity Framework. You can download the sample code here, the rest of this post tries to explain how it all works.

Requirements

I set myself some goals:

a) Objects should be code-generated in a way similar to the standard Entity Framework code generation and the resulting code’s public surface should be similar. There will be some differences in the way collections and references are handled.

b) Collections should be represented by classes that implement ICollection<T> and should always be ready to use without “IsLoaded/Load”.

c) EntityReference<T> and EntityCollection<T> should be completely hidden from the user

d) Each (N-to-0..1) reference should be represented solely by a property where the type is the target object type (no EntityReference<T> properties in the object layer).

e) We don’t want to materialize the object at the other end of the relationship just to see whether it is null or not:

 Order o;

if (o.Customer != null)
{
    Console.WriteLine("We have a customer!");
}

f) We don’t want to materialize the object if we don’t care about its properties (for example changing “Customer” navigation property on “o” does not require the Customer object to be loaded at all - today we can use EntityKeys to achieve similar thing):

 Order o;
Order o2;
o.Customer = o2.Customer;

g) Each object must be able to live in two states: loaded and unloaded and the object must be able to load itself on first access to the property. Unloaded objects that haven’t been accessed are really just wrappers for the EntityKey, objects that have been touched have actual data:

 Order o = ...;

if (o.Customer != null)
{
    // loads o.Customer on-demand
    Console.WriteLine(o.Customer.Address.City);
}

h) Object in the unloaded state should be as cheap to create as possible.

Implementation

Because each object has to be delay-loadable and cheap to create, we are representing a single entity as a pair of objects. One is the “shell” that has all the properties and navigation properties of an entity and the EntityKey and the other that holds actual data (minus the key).  Property getters and setters on the shell class delegate read/write operations to the data class which is lazily created (to conserve memory when not needed).

This is a pseudo-code that demonstrates this (_data management is not shown here – actual _data reference and entity key is held in the base class)

 // shell class - has no fields to hold actual data, just 
// a reference to lazy-initialized data object –this will not compile
public class Order
{
    private EntityKey _key; // each shell has an identity 
    private OrderData _data; // reference to lazy-initialized data

    public int OrderID 
    {
        get { return _key.Something; }
        set { _key.Something = value; }
    }

    public DateTime OrderDate
    { 
        get { return _data.OrderDate; } 
        set { _data.OrderDate = value; }
    }

    public string ShipTo
    {
        get { return _data.ShipTo; }
        set { _data.ShipTo = value; }
    }

    public string BillTo
    {
        get { return _data.BillTo; }
        set { _data.BillTo = value; }
    }

    public Customer Customer { get; set; } // details not shown
    public ICollection<OrderLine> Lines { get; }
}

// data class - just a bunch of fields
internal class OrderData
{
    internal DateTime OrderDate;
    internal string ShipTo;
    internal string BillTo;
}

For objects in “unloaded” stage there is just one object (Order), for loaded objects “OrderData” is initialized so property accesses actually work. The first time user accesses the property getter or setter and _data is null, the data is brought from the store.

When the user navigates a {one,many}-to-one relationship we create a shell object that has only primary key initialized, attach it to the context and return to user. The Data object is not created at all and “_data” pointer is null. When a property is accessed for the first time, the data gets initialized by calling objectcontext.Refresh(StoreWins) which brings all properties and relationships into memory.

Collections are rather simple – all we have to do is return a wrapper over EntityCollection<T> that does Load() under the hood when the data is actually needed (for example in foreach()).

Implementation details

The implementation takes advantage of the fact that Entity Framework supports IPOCO. We introduce a base class called LazyEntityObject that all code-generated objects derive from, and that implements all interfaces required by Entity Framework (IEntityWithKey, IEntityWithChangeTracking, IEntityWithRelationships) and a new interface ILazyEntityObject. The implementation of these interfaces is done explicitly, which means that there is no single public API exposed on actual entity objects (not even EntityKey).

In the actual implementation (compared to the pseudo-code) the data class is an inner private class of each entity class and property getters and setters are implemented through statically declared Data Properties – a concept similar to WPF dependency properties. They are statically initialized with delegates that get/set actual data but perform all the needed operations under the hood (such as change tracking and lazy initialization). As a result everything is type-safe and there is no need to use reflection. Thanks to Colin for the idea!

With this in place the code generated for each property getter/setter is a simple one-liner, whether it is a simple property, a reference or a collection:

 [EdmScalarPropertyAttribute(EntityKeyProperty=false, IsNullable=false)]
public Single Discount
{
    get { return Data.DiscountProperty.Get(this); }
    set { Data.DiscountProperty.Set(this, value); }
}

[EdmRelationshipNavigationPropertyAttribute("NorthwindEFModel", "Order_Details_Order", "Order")]
public Order Order
{
    get { return Data.OrderProperty.Get(this); }
    set { Data.OrderProperty.Set(this, value); }
}

The Data class itself is also clean (just a bunch of fields + static data properties) and all the hard work is done in the implementation of Data Property classes.

 private class Data : ILazyEntityObjectData
{
    private Decimal UnitPrice;
    private Int16 Quantity;
    private Single Discount;

    // primary key
    public static DataKeyProperty<OrderDetail,Int32> OrderIDProperty = 
                  new DataKeyProperty<OrderDetail,Int32>(c => c.OrderID, (c, v) => c.OrderID = v, "OrderID");
    public static DataKeyProperty<OrderDetail,Int32> ProductIDProperty = 
                  new DataKeyProperty<OrderDetail,Int32>(c => c.ProductID, (c, v) => c.ProductID = v, "ProductID");
    // non-key properties
    public static DataProperty<OrderDetail,Data,Decimal> UnitPriceProperty = 
                  new DataProperty<OrderDetail,Data,Decimal>(c => c.UnitPrice, (c, v) => c.UnitPrice = v, "UnitPrice");
    public static DataProperty<OrderDetail,Data,Int16> QuantityProperty = 
                  new DataProperty<OrderDetail,Data,Int16>(c => c.Quantity, (c, v) => c.Quantity = v, "Quantity");
    public static DataProperty<OrderDetail,Data,Single> DiscountProperty = 
                  new DataProperty<OrderDetail,Data,Single>(c => c.Discount, (c, v) => c.Discount = v, "Discount");
    // references
    public static DataRefProperty<OrderDetail,Data,Order> OrderProperty = 
                  new DataRefProperty<OrderDetail,Data,Order>("NorthwindEFModel.Order_Details_Order","Order","Order");
    public static DataRefProperty<OrderDetail,Data,Product> ProductProperty = 
                  new DataRefProperty<OrderDetail,Data,Product>("NorthwindEFModel.Order_Details_Product","Product","Product");
}

Data Properties Explained

Each data property is statically initialized in the data class and has two methods: Get() and Set().

  • Get() takes a single argument – the shell object and returns the property value
  • Set() takes two arguments: shell object and new property value. It sets the property to the value provided.

There are 4 types of data properties:

  1. Simple properties (DataProperty class) that are responsible for getting and setting non-key, non-navigation properties
  2. Key properties (DataKeyProperty) that are responsible for gettings and settings properties that are part of the primary key (the values are stored in the shell class itself)
  3. Collection properties (DataCollectionProperty) that manage object collections
  4. Reference properties (DataRefProperty) that are responsible for getting and setting reference properties

Simple property (implemented in DataProperty.cs) makes sure that the data object has been initialized on-demand and delegates to ObjectContext.Refresh() to fetch object values and relationships. When setting property values, it calls ReportPropertyChanging and ReportPropertyChanged so that object state is properly tracked.

Key properties do nothing more than calling ReportPropertyChanging/ReportPropertyChanged in addition to getting and setting actual key values in the shell object.

Collection properties take care of initializing relationships in the RelationshipManager and wrapping the results with LazyEntityCollection<T> for load-on-demand functionality.

Reference properties are probably the most interesting ones, because they deal with stub objects. Whenever the user navigates a relationship that has not yet been initialized, a new stub object (that is just a shell without data) is created and attached to the object context. There is a little additional complication with handling polymorphic objects, because we need to know the concrete subtype to create based just on the EntityKey, but that is a story for a separate article.

Usage

Code generation application (EFLazyClassGen project in the sample solution) emits code that is meant to be a drop-in replacement for designer-generated code (namespaces and class names are the same). Just invoke that with two parameters:

EFLazyClassGen input.[csdl,edmx] output.cs

Only simple code generation is supported (for example multiple schemas are not) at this point and I’ve only tested this against NorthwindEF and AdventureWorksXLT schemas.

Generated classes have public interface similar to one generated by EdmGen - some notable differences are:

  1. EntityKey and EntityState members are not publicly exposed (you can still get to them by casting to IEntityWithKey)
  2. Serialization is not supported (no serialization-specific are generated). If you want to serialize lazy objects, you have to do this using DTO (Data Transfer Objects)
  3. There is no *Reference property on many-to-one relationships. It means there is no way to control the "loaded" state of related end, but that should not be a problem since everything appears to be loaded.

LazyObjectContext derives from ObjectContext and adds two new events, which can be used to trace the internal workings of EFLazyLoading:

  1. LazyObjectContext.StubCreated - occurs whenever new stub object is created
  2. LazyObjectContext.ObjectLoaded - occurs whenever delayed occurs occurs

See the samples for more information. There are also new LazyObjectContext methods:

  1. Reset(ILazyEntityObject) - which detaches and releases data object from a shell object - while keeping the object attached to the context.
  2. ResetAllUnchangedObjects() - does the same thing for all unchanged objects in the context - objects will be demand-loaded next time any of the properties is accessed.

In the ZIP file there is a help file (CHM) which has auto-generated API documentation (using Sandcastle). I hope this will be useful.

Lessons learned

The first and foremost lesson learned is that it is quite possible to have transparent lazy loading working with Entity Framework. Being able to write your own entity classes (provided that they adhere to IPOCO specification) that add functionality under the hood opens up a whole new world of possibilities.

Possible applications of this technique may include cross-ObjectContext object sharing & caching (that may be actually very simple, because you can easily share “Data” objects if you can only make them read-only and copy on write).

In the next post I will explain the object type cache (for managing EntityKey to concrete type mapping) and introduce additional extension methods that make it possible to write LINQ and Entity SQL queries that return stubs of objects.