Tip 28 – How to implement an Eager Loading strategy


Background:

Over the last 2 years lots of people have complained about the way Eager loading works in the Entity Framework, or rather the way you ask the Entity Framework to eagerly load.

Here is how you do it:

var results = from b in ctx.Blogs.Include(“Posts”)
              where b.Owner == “Alex”
              select b;

This snippets asks the EF to eager load each matching Blog’s Posts, and it works great.

The problem is the ‘Posts’ string. LINQ in general and LINQ to SQL in particular have spoilt us, we all now expect type safety everywhere, and a string, is well… not type safe.

Instead everyone wants something like this:

var results = from b in ctx.Blogs.Include(b => b.Posts)
              where b.Owner == “Alex”
              select b;

This is a lot safer. And a number of people have tried something like this before, including my mate Matthieu.

But even better would be something like this:

var strategy = new IncludeStrategy<Blog>();
strategy.Include(b => b.Owner);

var results = from b in strategy.ApplyTo(ctx.Blogs)
              where b.Owner == “Alex”
              select b;

Because here you can re-use strategies, between queries.

Design Goals:

So I decided I wanted to have a play myself and extend this idea to support strategies.

Here are the types of things I wanted to support:

var strategy = Strategy.NewStrategy<Blog>();
strategy.Include(b => b.Owner)
        .Include(p => p.Comments); //sub includes
strategy.Include(b => b.Posts);    //multiple includes

The ability to sub-class the strategy class

public class BlogFetchStrategy: IncludeStrategy<Blog>
{
    public BlogFetchStrategy()
    {
        this.Include(b => b.Owner);
        this.Include(b => b.Posts);
    }
}

so you can do things like this:

var results = from b in new BlogFetchStrategy().ApplyTo(ctx.Blogs)
              where b.Owner == “Alex”
              select b;

Implementation:

Here is how I implemented this:

1) Create the IncludeStrategy<T> class:

public class IncludeStrategy<TEntity> 
   where TEntity : class, IEntityWithRelationships
{
    private List<string> _includes = new List<string>();

    public SubInclude<TNavProp> Include<TNavProp>(
            
Expression<Func<TEntity, TNavProp>> expr
    ) where TNavProp : class, IEntityWithRelationships
    {
        return new SubInclude<TNavProp>(
             _includes.Add,
             new IncludeExpressionVisitor(expr).NavigationProperty
        );
    }

    public SubInclude<TNavProp> Include<TNavProp>(
      Expression<Func<TEntity, EntityCollection<TNavProp>>> expr
    ) where TNavProp : class, IEntityWithRelationships
    {
        return new SubInclude<TNavProp>(
            _includes.Add,
            new IncludeExpressionVisitor(expr).NavigationProperty
        );
    }

    public ObjectQuery<TEntity> ApplyTo(ObjectQuery<TEntity> query)
    {
        var localQuery = query;
        foreach (var include in _includes)
        {
            localQuery = localQuery.Include(include);
        }
        return localQuery;
    }
}

Notice that there is a list of strings that holds the Includes we want. And notice that the ApplyTo(…) method allows you to register the Includes with an ObjectQuery<T>, so long as the T’s match.

But of course the bulk of the work is in the two Include(..) methods.

There are two because I wanted to have one for including References and one for including Collections. This implementations are designed to work with .NET 3.5 SP1 so I can rely on classes that have relationships (the only type for which Include makes sense) implementing IEntityWithRelationships. Hence the use of generic constraints.

One thing that is interesting is that for the Include method for Collections, even though the Expression is Expression<Func<TEntity, EntityCollection<TNavProp>>> the return object for creating sub-includes is typed to TNavProp. This is allows us to neatly bypass needing to interpret expressions like this:

Include(b => b.Posts.SelectMany(p => p.Author));

or invent some sort of DSL like this:

Include(b => b.Posts.And().Author);

By instead doing this:

Include(b => b.Posts).Include(p => p.Author);

Which is much much easier to implement, and I would argue to use too.

This idea is central to the whole design.

2) The IncludeExpressionVisitor is a class derived from a copy of the ExpressionVisitor sample you can find here. It is very simple, in fact it is so simple it is probably overkill to use a visitor here, but I wanted to bone up on the correct patterns etc:

public class IncludeExpressionVisitor : ExpressionVisitor
{
    private string _navigationProperty = null;

    public IncludeExpressionVisitor(Expression expr)
    {
        base.Visit(expr);
    }
    public string NavigationProperty
    {
        get { return _navigationProperty; }
    }

    protected override Expression VisitMemberAccess(
         MemberExpression m
    )
    {
        PropertyInfo pinfo = m.Member as PropertyInfo;

        if (pinfo == null)
            throw new Exception(
                 “You can only include Properties”);

        if (m.Expression.NodeType != ExpressionType.Parameter)
             throw new Exception(
  “You can only include Properties of the Expression Parameter”);

        _navigationProperty = pinfo.Name;

        return m;
    }

    protected override Expression Visit(Expression exp)
    {
        if (exp == null)
            return exp;
        switch (exp.NodeType)
        {
            case ExpressionType.MemberAccess:
                return this.VisitMemberAccess(
                        (MemberExpression)exp
                       );
            case ExpressionType.Lambda:
                return this.VisitLambda((LambdaExpression)exp);
            default:
                throw new InvalidOperationException(
                     “Unsupported Expression”);
        }
    }
}

As you can see this visitor is fairly constrained, it only recognizes LambdaExpressions and MemberExpressions. When visiting a MemberExpression it checks to make sure that the Member being access is a Property, and that the member is bound directly to the parameter (i.e. p.Property is okay but p.Property.SubProperty is not). Once it is happy it records the name of the NavigationProperty.

3) Once we know the NavigationProperty name the IncludeStrategy.Include methods create a SubInclude<T> object. This is responsible for registering our intent to include the NavigationProperty, and provides a mechanism for chaining more sub-includes.

The SubInclude<T> class looks like this:

public class SubInclude<TNavProp>
    where TNavProp : class, IEntityWithRelationships
{

    private Action<string> _callBack;
    private string[] _paths; 

    internal SubInclude(Action<string> callBack, params string[] path)
    {
        _callBack = callBack;
        _paths = path;
        _callBack(string.Join(“.”, _paths));
    }

    public SubInclude<TNextNavProp> Include<TNextNavProp>(
       Expression<Func<TNavProp, TNextNavProp>> expr
    ) where TNextNavProp : class, IEntityWithRelationships
    {
        string[] allpaths = _paths.Append(
           new IncludeExpressionVisitor(expr).NavigationProperty
        );

        return new SubInclude<TNextNavProp>(_callBack, allpaths);
    }

    public SubInclude<TNextNavProp> Include<TNextNavProp>(
  Expression<Func<TNavProp, EntityCollection<TNextNavProp>>> expr
    ) where TNextNavProp : class, IEntityWithRelationships
    {
        string[] allpaths = _paths.Append(
          new IncludeExpressionVisitor(expr).NavigationProperty
        );

        return new SubInclude<TNextNavProp>(_callBack, allpaths);
    }
}

4) Now the only thing missing is a little extension method I wrote to append another element to an array, that looks something like this:

public static T[] Append<T>(this T[] initial, T additional)
{
    List<T> list = new List<T>(initial);
    list.Add(additional);
    return list.ToArray();
}

With this code in place you can write your own eager loading strategy classes very easily, simply by deriving from IncludeStrategy<T>.

All the code you need is in this post, but please bear in mind this is just a sample, it NOT an official Microsoft release, and as such has not been rigorously tested etc.

If you accept that I’m just a Program Manager, and I’m eminently fallible, and you *still* want to try this out, you can download a copy of the source here.

Enjoy.

EagerLoading.zip

Comments (20)

  1. Anonymous says:

    Nice! I gave something similar a try a while ago but I must admit I failed big time on the SubIncludes! Going to try yours out. Many have tried and semi-failed before you 🙂

  2. Anonymous says:

    Interesting impelementation! I will have a deeper look at what you provided here. Nice idea by the way.

  3. Anonymous says:

    I would love to see something along these lines baked into the framework.  Avoiding "magic strings" and allowing re-use is definitely a Good Thing.

  4. Alex D James says:

    @Shawn couldn’t agree more.

    @Michael, well I don’t think I semi-failed, but you guys the real judges, so let me know what you think.

    @Muhammad, glad you like the idea, any feedback would be appreciated.

  5. Robert90048 says:

    Please keep up the cool tips. EF is going great, no matter what some peeps say. I haven’t used anything this powerful since I learned to count SQL group by havings. Thanks!

  6. Anonymous says:

    Wow,

    this is perfect. This is the most elegant solution I have seen so far.

    Any clue if this will work in next version of Entity framework?

    Thanks

  7. Alex D James says:

    @Daniel,

    If you use default code-gen (i.e. classes that derive from EntityObject) this will work in 4.0 too.

    However if you write POCO classes, you’ll have to make a few changes to the API, but the principle will still work.

    The key will be to remove the IEntityWithRelationship and EntityCollection from the generic method constraints.

    Alex

  8. Anonymous says:

    Hi Alex,

    I was looking for something similar but,can you please help me how can I do something like

    DataEnitites de = new DataEntities();

    var query = de[‘TableName’].Select(col1,Table2.Col1);

    This is reqired because the entity name and the field will be know at runtime.

    I tried doing his……

    I have added this Indexer Prpoperty to the main EntityContext Class which returns me the table which I want to query.

    public global::System.Data.Objects.ObjectQuery<System.Data.Objects.DataClasses.EntityObject> this[string indexer]

           {

               get

               {

                   string entity = "[" + indexer + "]";

                   return base.CreateQuery<System.Data.Objects.DataClasses.EntityObject>(entity);              

               }

           }

    With this entity I can hard code the columns in Select Method, something like

    var query = dce["Table2"].Select("col2");        

    But I want that the Select Method accepts the entity objects which are having relationship so that Entity Framework can do join based query for me automatically.

    Please help me on this Alex.

    Thanks a lot.

  9. Alex D James says:

    Ashish,

    I was following your question right up to then end, but you lost me with this:

    "But I want that the Select Method accepts the entity objects which are having relationship so that Entity Framework can do join based query for me automatically."

    Can you give me an example of the code you want to write? It would really help me. Also if you can use an example model in your code snippets it will really help me understand what you want i.e. Person.Mother.

    -Alex

  10. Anonymous says:

    Great blog, man.

    I have a question on stackoverflow that nobody could answer, related to include and recursive hierarchies, maybe you could help me 😉

    http://stackoverflow.com/questions/1308158

    thanks!

  11. Anonymous says:

    This would be great, waiting for it

  12. Anonymous says:

    Hi,

    I only have the ID of the object to be deleted. I want to issue an UPDATE stmt on the DB so that the is_deleted field in the object gets marked as true, thereby soft-deleting the object. I could do :

    c = context.Customers.Where(Customer => Customer.id==123) ;

    context.Attach(c) ;

    c.is_deleted = true ;

    context.SaveChanges() ;

    But this would mean firing a SELECT stmt and then an UPDATE that updates ALL the columns.

    What I would like to do is fire an UPDATE on a single column.

    I use .NET 3.5 sp1.

    Regards,

    Yash

  13. Alex D James says:

    Yash,

    Well it is occasionally possible to do updates without getting all the fields check out Tip 15, for more

    Alex

  14. Anonymous says:

    If we want to load multiple related entities we can use explicit loading of related objects:

    if (!order.SalesOrderDetail.IsLoaded)

    {

       order.SalesOrderDetail.Load();

    }

    But this will cause multiple round-trips to the DB.

    If I invoke something like:

    ObjectQuery<SalesOrderHeader> query =

       context.SalesOrderHeader.Include("SalesOrderDetail").Include("Address");

    the query that is generated has a number of joins. It generates datatable with a wide data row which gets wider with every new related entity type we specify.

    1. Instead of a single data table, it would be more efficient to retrieve multiple data tables, each containing the data for a single entity type. EF can then link them based on the common keys between the data tables. This is NOT lazy-loading. This is upfront loading.
    2. Additionally the developer should be allowed to provide a custom stored proc if he wants to use something different than the generated dynamic sql for retrieving the related entities. Something like:

    ObjectQuery<SalesOrderHeader> query =

       context.SalesOrderHeader.Include("SalesOrderDetail").Include("Address").UseSproc("sp_LoadSalesOrder");

    Thanks,

    Yash

  15. Alex D James says:

    Yash,

    Yes you are recommending a strategy used by some ORMs. It is definitely something we should consider for future versions of the Entity Framework.

    The complication is that streaming nature of LINQ, i.e. you would have to load all datatables before yielding any entities. Which is somewhat counter to the idea of IEnumerable<> and IQueryable<>

    Cheers

    Alex

  16. Anonymous says:

    I tried smth similar and saw it broken when trying to use compiled queries. Can you advise where to dig to have it working in compiled linq as well?

  17. Alex D James says:

    nayato – sorry there is nothing special you can do to make this work with CompiledQuery… short of adding surfact to collect all the Include strings from the strategy and apply them when constructing the CompiledQuery.

    Alex

  18. Anonymous says:

    Thought I'd share two helpful extension methods I wrote. One wraps the "ApplyTo" method back into "Include". The other is to make it quick-n-simple when you only need to include one property in your query.

    public static ObjectQuery<TEntity> Include<TEntity>(this ObjectQuery<TEntity> query, IncludeStrategy<TEntity> includeStrategy)

       where TEntity : class, IEntityWithRelationships

    {

       return includeStrategy.ApplyTo(query);

    }

    public static ObjectQuery<TEntity> Include<TEntity, TNavProp>(this ObjectQuery<TEntity> query, Expression<Func<TEntity, EntityCollection<TNavProp>>> expr)

       where TEntity : class, IEntityWithRelationships

       where TNavProp : class, IEntityWithRelationships

    {

       var strategy = new IncludeStrategy<TEntity>();

       strategy.Include(expr);

       return query.Include(strategy);

    }

  19. Anonymous says:

    Hi

    How can i use your strategy to implement something like this?

    select customer.*, order.id, orderdetails.productid

    where customer.id = order.customerid

    and order.id = orderdetails.orderid

    and orderdetails.productid in

    (select products.id from supplier, products

    where supplier.id = products.supplierid

    and supplier.name = "SomeName")

    Also how to extend your strategy to implement paging functionality?

  20. Anonymous says:

    Thank you so much, I've been wanting to do just this very thing. It was tumbling up there in my subconscious today and damn, I just found exactly what I was looking for!