Transparent Lazy Loading for Entity Framework – part 2


This post is a part of the series that describes EFLazyLoading library.

As I promised last time, I would like to present the result of a little experiment in implementing transparent lazy loading for Entity Framework. You can download the sample code here, the rest of this post tries to explain how it all works.

Requirements

I set myself some goals:

a) Objects should be code-generated in a way similar to the standard Entity Framework code generation and the resulting code’s public surface should be similar. There will be some differences in the way collections and references are handled.

b) Collections should be represented by classes that implement ICollection<T> and should always be ready to use without “IsLoaded/Load”.

c) EntityReference<T> and EntityCollection<T> should be completely hidden from the user

d) Each (N-to-0..1) reference should be represented solely by a property where the type is the target object type (no EntityReference<T> properties in the object layer).

e) We don’t want to materialize the object at the other end of the relationship just to see whether it is null or not:

Order o;

if (o.Customer != null)
{
    Console.WriteLine("We have a customer!");
}

f) We don’t want to materialize the object if we don’t care about its properties (for example changing “Customer” navigation property on “o” does not require the Customer object to be loaded at all – today we can use EntityKeys to achieve similar thing):

Order o;
Order o2;
o.Customer = o2.Customer;

g) Each object must be able to live in two states: loaded and unloaded and the object must be able to load itself on first access to the property. Unloaded objects that haven’t been accessed are really just wrappers for the EntityKey, objects that have been touched have actual data:

Order o = ...;

if (o.Customer != null)
{
    // loads o.Customer on-demand
    Console.WriteLine(o.Customer.Address.City);
}

h) Object in the unloaded state should be as cheap to create as possible.

Implementation

Because each object has to be delay-loadable and cheap to create, we are representing a single entity as a pair of objects. One is the “shell” that has all the properties and navigation properties of an entity and the EntityKey and the other that holds actual data (minus the key).  Property getters and setters on the shell class delegate read/write operations to the data class which is lazily created (to conserve memory when not needed).

This is a pseudo-code that demonstrates this (_data management is not shown here – actual _data reference and entity key is held in the base class)

// shell class - has no fields to hold actual data, just 
// a reference to lazy-initialized data object –this will not compile
public class Order
{
    private EntityKey _key; // each shell has an identity 
    private OrderData _data; // reference to lazy-initialized data

    public int OrderID 
    {
        get { return _key.Something; }
        set { _key.Something = value; }
    }

    public DateTime OrderDate
    { 
        get { return _data.OrderDate; } 
        set { _data.OrderDate = value; }
    }

    public string ShipTo
    {
        get { return _data.ShipTo; }
        set { _data.ShipTo = value; }
    }

    public string BillTo
    {
        get { return _data.BillTo; }
        set { _data.BillTo = value; }
    }

    public Customer Customer { get; set; } // details not shown
    public ICollection<OrderLine> Lines { get; }
}

// data class - just a bunch of fields
internal class OrderData
{
    internal DateTime OrderDate;
    internal string ShipTo;
    internal string BillTo;
}

For objects in “unloaded” stage there is just one object (Order), for loaded objects “OrderData” is initialized so property accesses actually work. The first time user accesses the property getter or setter and _data is null, the data is brought from the store.

When the user navigates a {one,many}-to-one relationship we create a shell object that has only primary key initialized, attach it to the context and return to user. The Data object is not created at all and “_data” pointer is null. When a property is accessed for the first time, the data gets initialized by calling objectcontext.Refresh(StoreWins) which brings all properties and relationships into memory.

Collections are rather simple – all we have to do is return a wrapper over EntityCollection<T> that does Load() under the hood when the data is actually needed (for example in foreach()).

Implementation details

The implementation takes advantage of the fact that Entity Framework supports IPOCO. We introduce a base class called LazyEntityObject that all code-generated objects derive from, and that implements all interfaces required by Entity Framework (IEntityWithKey, IEntityWithChangeTracking, IEntityWithRelationships) and a new interface ILazyEntityObject. The implementation of these interfaces is done explicitly, which means that there is no single public API exposed on actual entity objects (not even EntityKey).

In the actual implementation (compared to the pseudo-code) the data class is an inner private class of each entity class and property getters and setters are implemented through statically declared Data Properties – a concept similar to WPF dependency properties. They are statically initialized with delegates that get/set actual data but perform all the needed operations under the hood (such as change tracking and lazy initialization). As a result everything is type-safe and there is no need to use reflection. Thanks to Colin for the idea!

With this in place the code generated for each property getter/setter is a simple one-liner, whether it is a simple property, a reference or a collection:

[EdmScalarPropertyAttribute(EntityKeyProperty=false, IsNullable=false)]
public Single Discount
{
    get { return Data.DiscountProperty.Get(this); }
    set { Data.DiscountProperty.Set(this, value); }
}

[EdmRelationshipNavigationPropertyAttribute("NorthwindEFModel", "Order_Details_Order", "Order")]
public Order Order
{
    get { return Data.OrderProperty.Get(this); }
    set { Data.OrderProperty.Set(this, value); }
}

The Data class itself is also clean (just a bunch of fields + static data properties) and all the hard work is done in the implementation of Data Property classes.

private class Data : ILazyEntityObjectData
{
    private Decimal UnitPrice;
    private Int16 Quantity;
    private Single Discount;

    // primary key
    public static DataKeyProperty<OrderDetail,Int32> OrderIDProperty = 
                  new DataKeyProperty<OrderDetail,Int32>(c => c.OrderID, (c, v) => c.OrderID = v, "OrderID");
    public static DataKeyProperty<OrderDetail,Int32> ProductIDProperty = 
                  new DataKeyProperty<OrderDetail,Int32>(c => c.ProductID, (c, v) => c.ProductID = v, "ProductID");
    // non-key properties
    public static DataProperty<OrderDetail,Data,Decimal> UnitPriceProperty = 
                  new DataProperty<OrderDetail,Data,Decimal>(c => c.UnitPrice, (c, v) => c.UnitPrice = v, "UnitPrice");
    public static DataProperty<OrderDetail,Data,Int16> QuantityProperty = 
                  new DataProperty<OrderDetail,Data,Int16>(c => c.Quantity, (c, v) => c.Quantity = v, "Quantity");
    public static DataProperty<OrderDetail,Data,Single> DiscountProperty = 
                  new DataProperty<OrderDetail,Data,Single>(c => c.Discount, (c, v) => c.Discount = v, "Discount");
    // references
    public static DataRefProperty<OrderDetail,Data,Order> OrderProperty = 
                  new DataRefProperty<OrderDetail,Data,Order>("NorthwindEFModel.Order_Details_Order","Order","Order");
    public static DataRefProperty<OrderDetail,Data,Product> ProductProperty = 
                  new DataRefProperty<OrderDetail,Data,Product>("NorthwindEFModel.Order_Details_Product","Product","Product");
}

Data Properties Explained

Each data property is statically initialized in the data class and has two methods: Get() and Set().

  • Get() takes a single argument – the shell object and returns the property value
  • Set() takes two arguments: shell object and new property value. It sets the property to the value provided.

There are 4 types of data properties:

  1. Simple properties (DataProperty class) that are responsible for getting and setting non-key, non-navigation properties
  2. Key properties (DataKeyProperty) that are responsible for gettings and settings properties that are part of the primary key (the values are stored in the shell class itself)
  3. Collection properties (DataCollectionProperty) that manage object collections
  4. Reference properties (DataRefProperty) that are responsible for getting and setting reference properties

Simple property (implemented in DataProperty.cs) makes sure that the data object has been initialized on-demand and delegates to ObjectContext.Refresh() to fetch object values and relationships. When setting property values, it calls ReportPropertyChanging and ReportPropertyChanged so that object state is properly tracked.

Key properties do nothing more than calling ReportPropertyChanging/ReportPropertyChanged in addition to getting and setting actual key values in the shell object.

Collection properties take care of initializing relationships in the RelationshipManager and wrapping the results with LazyEntityCollection<T> for load-on-demand functionality.

Reference properties are probably the most interesting ones, because they deal with stub objects. Whenever the user navigates a relationship that has not yet been initialized, a new stub object (that is just a shell without data) is created and attached to the object context. There is a little additional complication with handling polymorphic objects, because we need to know the concrete subtype to create based just on the EntityKey, but that is a story for a separate article.

Usage

Code generation application (EFLazyClassGen project in the sample solution) emits code that is meant to be a drop-in replacement for designer-generated code (namespaces and class names are the same). Just invoke that with two parameters:

EFLazyClassGen input.[csdl,edmx] output.cs

Only simple code generation is supported (for example multiple schemas are not) at this point and I’ve only tested this against NorthwindEF and AdventureWorksXLT schemas.

Generated classes have public interface similar to one generated by EdmGen – some notable differences are:

  1. EntityKey and EntityState members are not publicly exposed (you can still get to them by casting to IEntityWithKey)
  2. Serialization is not supported (no serialization-specific are generated). If you want to serialize lazy objects, you have to do this using DTO (Data Transfer Objects)
  3. There is no *Reference property on many-to-one relationships. It means there is no way to control the "loaded" state of related end, but that should not be a problem since everything appears to be loaded.

LazyObjectContext derives from ObjectContext and adds two new events, which can be used to trace the internal workings of EFLazyLoading:

  1. LazyObjectContext.StubCreated – occurs whenever new stub object is created
  2. LazyObjectContext.ObjectLoaded – occurs whenever delayed occurs occurs

See the samples for more information. There are also new LazyObjectContext methods:

  1. Reset(ILazyEntityObject) – which detaches and releases data object from a shell object – while keeping the object attached to the context.
  2. ResetAllUnchangedObjects() – does the same thing for all unchanged objects in the context – objects will be demand-loaded next time any of the properties is accessed.

In the ZIP file there is a help file (CHM) which has auto-generated API documentation (using Sandcastle). I hope this will be useful.

Lessons learned

The first and foremost lesson learned is that it is quite possible to have transparent lazy loading working with Entity Framework. Being able to write your own entity classes (provided that they adhere to IPOCO specification) that add functionality under the hood opens up a whole new world of possibilities.

Possible applications of this technique may include cross-ObjectContext object sharing & caching (that may be actually very simple, because you can easily share “Data” objects if you can only make them read-only and copy on write).

In the next post I will explain the object type cache (for managing EntityKey to concrete type mapping) and introduce additional extension methods that make it possible to write LINQ and Entity SQL queries that return stubs of objects.

Comments (16)

  1. Régulièrement quand je parle de l’Entity Framework, on me reproche très souvent l’absence de Lazy Loading.

  2. daveblack says:

    Thank you so much for providing this "extension" to EF!  It makes my code much more simple and eases my worries about performance.

  3. Hot Topics says:

    There has been a lot of discussions lately about Entity Framework and Lazy Loading as well as some solutions

  4. Jaroslaw Kowalski napsal pěkné posty o tom, jak "vyrobit" transparent lazy loading v EF a připravil i

  5. Brilliant job!

    I loved the way you mixed generics and lamba expressions. Just reminds me those C++ template library designs, and yep, after few years we’ve got the basis for "beautiful" coding!!!

    Jamal Mavadat

  6. In two previous articles ( part1 and part2 ) I have introduced EFLazyLoading &#8211; a framework for

  7. SiTox.NET says:

    Ar dažām jaunām tehnoloģijām ir tā, ka tās izlaiž, parāda kaut kādas jaunas „features”, bet praksē tie

  8. Hot Topics says:

    Jarek Kowalski continues his series on implementing Transparent Lazy Loading in the EF. &#160; Part 1

  9. WardB says:

    I fear there are at least two problems with your approach, the first of which is most severe.

    Risk of wrong result: If the foreign key id for the customer of an order is 42 but there is no customer with id=42, anOrder.Customer reports that the customer exists when, in fact, it does not. The moment you write anOrder.Customer.Name you will blow when you go to fetch the customer .. and get nothing.

    [aside: you and Entity Framework have chosen not to use the NULL OBJECT pattern so you have to litter your code with tests for null entities. In this case, you postpone the day of reckoning.]

    Inconsistency with EF: In Entity Framework, anOrder.Customer.Load returns null because it does the lookup .. and finds no match. So you are proposing post-load behavior inconsistent with EF’s. That’s a troublesome design choice.

    Inconsistency within your framework: You are offering a virtual proxy for "reference" properties (order.Customer) but not collection properties (order.Details).

    So what did I miss?

  10. Jarek Kowalski napsal pěkné posty o tom, jak "vyrobit" transparent lazy loading v EF a připravil i prográmek

  11. eXcess says:

    Hi Jaroslaw, I just wondered what would be the drawback if I would implement lazy loading in our enterprise software through your framework and the Entity Framework version 2 comes out with lazy loading?

    I guess that there’ll be breaking changes anyway if we use standard ef v1 and upgrade to ef v2. But I just wanted your thoughts on that?

  12. In V2 we are making changes in the lazy loading area. It is too early to talk about details, but we are thinking of LL for both POCO objects and code-generated entitites.

    In any case the API will be most likely very different from the one in EFLazyLoading.

    For POCO objects the plan is to support LL through proxies, we are still working on details for EntityObjects and IPOCO.

    If you are interested in POCOs and proxy-based LL, you may want to take a look at EFPocoAdapter sample: (http://code.msdn.com/EFPocoAdapter) which also implements proxies.

    Given than POCOs are really plain objects with no additional APIs on them it should be pretty straightforward to migrate to V2 solution when it becomes available.

  13. SvenAelterman says:

    Jaroslaw,

    It seems that most solutions that somehow hack EF v1 into lazy loading always require that there is an active ObjectContext. In architectures where you want to use short-lived ObjectContext instances, this isn’t feasible.

    For example, I use ObjectContext in my Data Access Layer as follows:

    using (Entities db = new Entities())

    {

    // Load, etc

    }

    so, any time I call .Load() or try to load anything, my ObjectContext has already been disposed.

    Is there any workaround that includes recreating an ObjectContext so it can actually go to the data store and retrieve the additional objects.

    Sven.

  14. Tanveer Badar says:

    @SvenAelterman

    It isn’t possible in nHibernate either, the ORM I am most familiar with. nHibernate also requires an active ISession to lazily load objects.

  15. kraeg says:

    I feel a bit stupid asking this question… but, how do I ‘use’ the EFLazyClassGen Code generation application?  Where do I put it and how do I get VS 2008 to use it when generating my entities?