Improving ObjectQuery.Include

** UPDATE: There’s a bug in the code below – see this post for the update!

One of the great features of LINQ To SQL and LINQ To Entities is that the queries you write are checked by the compiler, which eliminates typing errors in your query. Unfortunately, the ObjectQuery<T>.Include function (which is used to eager-load data that isn’t directly included in the query) takes a string parameter, opening up opportunities for typos to creep back in. In this post I’ll present some sample code that illustrates one way that you can work round this. To start with, let’s take a quick look at a query against an entity model on the Northwind database.

 var query = from customer in context.Customers
            where customer.Orders.Count > 0
            select customer;

This query simply retrieves customers with an order. If the code that uses the query results then needs to make use of the order data, it won’t have been loaded. We can use the Include function to ensure that this data is loaded up front:

 var query = from customer in context.Customers .Include("Orders") 
            where customer.Orders.Count > 0
            select customer;

Notice the Include(“Orders”) call that we’ve inserted which instructs Entity Framework to retrieve the Orders for each Customer. It would be much nicer if we could use a lambda expression to specify what property to load:

 var query = from customer in context.Customers.Include(c => c.Orders)
            where customer.Orders.Count > 0
            select customer;

It turns out that this is very easy to achieve by using an extension method:

 public static class ObjectQueryExtensions
{
    public static ObjectQuery<TSource> Include<TSource, TPropType>(this ObjectQuery<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
    {
        MemberExpression memberExpression = propertySelector.Body as MemberExpression;
        if (memberExpression== null)
        {
            throw new InvalidOperationException("Expression must be a member expression" + propertySelector);
        }
        MemberInfo propertyInfo = memberExpression.Member;
        return source.Include(propertyInfo.Name);
    }
}

This Include extension method allows the query syntax above with the lambda expression. When the Include method is called, it inspects the Expression Tree. If the method is used as intended, the tree will describe a accessing a member of the TSource class. We can then us the name of the member to call the original Include function.

Whilst this solves the problem as I described it above, what if the code consuming the query also needed the order details? With the original Include function we can write

 var query = from customer in context.Customers .Include("Orders.Order_Details")  
            where customer.Orders.Count > 0
            select customer;

Notice that we can pass a path to the properties to include. The extension method we wrote doesn’t give us a way to handle this case. I imagine using something like the syntax below to describe this situation

 var query = from customer in context.Customers.Include(c => c.Orders.SubInclude(o => o.Order_Details))
            where customer.Orders.Count > 0
            select customer;

Here, we’ve specified that we want to include Orders, and then also that we want to include Order_Details. Adding this support is a bit more code, but not too bad:

 public static class ObjectQueryExtensions
{
    public static ObjectQuery<TSource> Include<TSource, TPropType>(this ObjectQuery<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
    {
        string includeString = BuildString(propertySelector);
        return source.Include(includeString);
    }
    private static string BuildString(Expression propertySelector)
    {
        switch(propertySelector.NodeType)
        {
            case ExpressionType.Lambda:
                LambdaExpression lambdaExpression = (LambdaExpression)propertySelector;
                return BuildString(lambdaExpression.Body);

            case ExpressionType.Quote:
                UnaryExpression unaryExpression= (UnaryExpression)propertySelector;
                return BuildString(unaryExpression.Operand);

            case ExpressionType.MemberAccess:
                MemberInfo propertyInfo = ((MemberExpression) propertySelector).Member;
                return propertyInfo.Name;

            case ExpressionType.Call:
                MethodCallExpression methodCallExpression = (MethodCallExpression) propertySelector;
                if (IsSubInclude(methodCallExpression.Method)) // check that it's a SubInclude call
                {
                    // argument 0 is the expression to which the SubInclude is applied (this could be member access or another SubInclude)
                    // argument 1 is the expression to apply to get the included property
                    // Pass both to BuildString to get the full expression
                    return BuildString(methodCallExpression.Arguments[0]) + "." +
                           BuildString(methodCallExpression.Arguments[1]);
                }
                // else drop out and throw
                break;
        }
        throw new InvalidOperationException("Expression must be a member expression or an SubInclude call: " + propertySelector.ToString());

    }

    private static readonly MethodInfo[] SubIncludeMethods;
    static ObjectQueryExtensions()
    {
        Type type = typeof (ObjectQueryExtensions);
        SubIncludeMethods = type.GetMethods().Where(mi => mi.Name == "SubInclude").ToArray();
    }
    private static bool IsSubInclude(MethodInfo methodInfo)
    {
        if (methodInfo.IsGenericMethod)
        {
            if (!methodInfo.IsGenericMethodDefinition)
            {
                methodInfo = methodInfo.GetGenericMethodDefinition();
            }
        }
        return SubIncludeMethods.Contains(methodInfo);
    }

    public static TPropType SubInclude<TSource, TPropType>(this EntityCollection<TSource> source, Expression<Func<TSource, TPropType>> propertySelector)
        where TSource : class, IEntityWithRelationships
        where TPropType : class
    {
        throw new InvalidOperationException("This method is only intended for use with ObjectQueryExtensions.Include to generate expressions trees"); // no actually using this - just want the expression!
    }
    public static TPropType SubInclude<TSource, TPropType>(this TSource source, Expression<Func<TSource, TPropType>> propertySelector)
        where TSource : class, IEntityWithRelationships
        where TPropType : class
    {
        throw new InvalidOperationException("This method is only intended for use with ObjectQueryExtensions.Include to generate expressions trees"); // no actually using this - just want the expression!
    }
}

This code still has the Include method with the original signature, and adds a couple of SubInclude extension methods. You can see that the code to extract the property name has been pulled out into a separate method (BuildString). This now also handles some additional NodeTypes so that we can handle the SubInclude calls inside the Include call. There are some checks in to ensure that we are dealing with the SubInclude calls at this point (using the IsSubInclude method). With this code, we can write the previous query as well as:

 var query = from customer in context.Customers.Include(c => c.Orders.SubInclude(o => o.Order_Details).SubInclude(od=>od.Products) )
            where customer.Orders.Count > 0
            select customer;

This query includes the Orders, Order Details, and Products as if we’d called Include(“Orders.Order_Details.Product”) and in fact this is what the code will do! Additionally, it doesn’t matter whether you chain the SubInclude calls (as above) or nest them:

 var query = from customer in context.Customers.Include(c => c.Orders.SubInclude(o => o.Order_Details .SubInclude(od => od.Products) ))
            where customer.Orders.Count > 0
            select customer;

Both of these queries have the same effect, so it’s up to you which style you prefer.

The code isn’t production ready (and as always, is subject to the standard disclaimer: “These postings are provided "AS IS" with no warranties, and confer no rights. Use of included script samples are subject to the terms specified at https://www.microsoft.com/info/cpyright.htm”). However, there are a couple of other features that I think are worth briefly mentioning:

  • The IsSubInclude function works against a cached MethodInfos for the SubInclude methods. Because these methods are generic, we have to get the generic method definition to test for comparison
  • The SubInclude functions are not intended to be called at runtime - they are purely there to get the compiler to generate the necessary expression tree!
  • Generic type inference is at work when we write the queries. Notice that we could write Include(c => c.Orders) rather than Include<Customer>(c => c.Orders). Imagine how much less readable the code would become, especially when including multiple levels.

I found it quite interesting putting this code together as it pulls in Extension Methods, LINQ To Entities and Expression Trees. The inspiration for this came from the LinkExtensions.ActionLink methods in ASP.Net MVC framework which do the same sort of thing for ActionLinks. I'’m not sure if I’m entirely satisfied with the syntax for including multiple levels, but it is the best I’ve come up with so far. If you’ve any suggestions for how to improve it then let me know!