Using LINQ Expressions to Generate Dynamic Methods

This week at DevConnections in Orlando, I gave a “deep-dive” talk on LINQ. I wanted to give people a feel for what's possible with the new language features and core APIs in .NET 3.5. I spent most of the talk discussing a single example: take an ADO.NET 2.0 code sample and simplify. Instead of using an existing library like LINQ to SQL or LINQ to Entities, I built a (limited) LINQ provider from scratch. As boilerplate code is moved into helper methods and a proto-LINQ-to-SQL API evolves, the sample is boiled down to something much more compact:

static List<Customer> GetCustomers()

{

    using (Table<Customer> table = new Table<Customer>(GetConnectionString(), "Customers"))

    {

        IEnumerable<Customer> query = from customer in table

                                      where customer.City == "London"

                                      select customer;

        return query.ToList();

    }

}

In this post, I’ll drill down on one component developed in the talk, which was also included in the EFExtensions library. This component takes rows from a data reader and transforms them (or “shapes” them) into typed results. For instance, I may want to transform records into customers. Using a couple of extension methods (Field<T> and Materialize<T> ), we can leverage a “shaper” delegate to do just this:

SqlCommand command = …;

return command.Materialize<Category>(r =>

    new Category

    {

        CategoryID = r.Field<int>("CategoryID"),

        CategoryName = r.Field<string>("CategoryName"),

        …

    });

The shaper delegate code (highlighted above) is still annoying though: for every property of the customer, I’m retrieving a column of the same name. The code is mechanical and repetitive, the sort of thing you want a machine to do rather than a programmer.

Here are three strategies you can use to automate this pattern in .NET 2.0:

1. Reflection: Write a general purpose delegate that uses reflection to construct an instance of T and then dynamically invokes property setters. While this code is relatively easy to write, the performance is sub-optimal. For information on the performance of various method dispatch patterns, take a look at this great talk by Joel Pobar and Joe Duffy.

2. Automatically generate the code: You can automatically generate wrapper classes encapsulating shaping logic. Code generation has its challenges however, in particular integration with build systems and the development environment. It’s also a lot of code to maintain.

3. Create a DynamicMethod implementing the pattern: .NET allows you to compile a delegate at runtime using dynamic methods. This resolves the performance and maintenance problems of solutions 1 and 2. MSIL generation is hard to get right unfortunately and the code does not reflect the intent of the generated delegate.

Digression: random thoughts on APIs

I recently heard the expression “Swiss Army Knife API”. These are interfaces that handle all kinds of little (possibly unrelated) problems. They can be useful but they are also hard to package and discover. The Zip method in EFExtensions illustrates these problems. It’s a handy method, but what is it doing in an EF library? It has nothing to do with the EF or with the scenarios addressed by the library (it’s poorly packaged), and no one trying to pair the elements of two iterators would think to look in that particular library (it’s not discoverable). If you need to install a PCI card, fillet a fish and hand-stitch a saddle, you might need a Swiss Army API.

At the other extreme, there are narrowly targeted APIs that can solve complex problems but most often require specialized knowledge or training. To make matters worse, once you’ve mastered them, you can rarely apply your knowledge to different domains. You can probably think of a few examples of this pattern.

LINQ achieves a useful balance. While the LINQ project was motivated by a specific requirement – seamless support for non-object data within .NET applications – all components of the solution are generically useful. Consider…

The System.Linq.Expressions API serves a specific need for integrated queries: it allows the compiler to describe the user’s code as a data structure that can then be translated to targets other than MSIL at runtime, like SQL, Web Services, etc. Expressions can also be compiled into delegates at runtime, which brings me to a .NET 3.5 solution to the default shaper problem… If the compiler can use expressions to describe code, so can we!

Here’s the code pattern we want to generate:

r => new T

{

    Property1 = r.Field<Type[Property1] >("Property1"),

    Property2 = r.Field<Type[Property2] >("Property2"),

    …

}

 

Shortcut: learning how to build an expression programmatically

If you want to figure out how to build expressions programmatically, a simple trick will probably save you some time. Just follow the compiler’s lead. First, write an example of the pattern, e.g.:

Expression <Func<IDataRecord, Customer>> example =

    r => new Customer { City = r.Field<string>("City"), };

After compiling your program, use a tool like .NET Reflector to figure out how the compiler builds the expression tree. In its default mode, this tool is actually a little bit too smart, disassembling the code into precisely what we wrote to begin with. If you change the disassembler settings (View à Options à Disassembler) to use “.NET 2.0” optimizations, it will give more useful output, e.g.:

ParameterExpression CS$0$0000;

Expression<Func<IDataRecord, Customer>> example =

    Expression.Lambda<Func<IDataRecord, Customer>>(

        Expression.MemberInit(

            Expression.New((ConstructorInfo) methodof(Customer..ctor), new Expression[0]),

            new MemberBinding[] {

                Expression.Bind(

                    (MethodInfo) methodof(Customer.set_City),

                    Expression.Call(null, (MethodInfo) methodof(Utility.Field), new Expression[] { CS$0$0000 = Expression.Parameter(typeof(IDataRecord), "r"), Expression.Constant("City", typeof(string)) })) }),

        new ParameterExpression[] { CS$0$0000 });

 

Once you’ve figured out the expression pattern, you can implement a general solution, e.g.:

private static Expression<Func<IDataRecord, T>> CreateDefaultShaper<T>()

{

    // Compiles a delegate of the form (IDataRecord r) => new T { Prop1 = r.Field<Prop1Type>("Prop1"), ... }

    ParameterExpression r = Expression.Parameter(typeof(IDataRecord), "r");

    // Create property bindings for all writable properties

    List<MemberBinding> bindings = new List<MemberBinding>();

    foreach (PropertyInfo property in GetWritableProperties<T>())

    {

        // Create expression representing r.Field<property.PropertyType>(property.Name)

        MethodCallExpression propertyValue = Expression.Call(

            typeof(Utility).GetMethod("Field").MakeGenericMethod(property.PropertyType),

            r, Expression.Constant(property.Name));

        // Assign the property value to property through a member binding

        MemberBinding binding = Expression.Bind(property, propertyValue);

        bindings.Add(binding);

    }

   

    // Create the initializer, which instantiates an instance of T and sets property values

    // using the member bindings we just created

    Expression initializer = Expression.MemberInit(Expression.New(typeof(T)), bindings);

    // Create the lambda expression, which represents the complete delegate (r => initializer)

    Expression<Func<IDataRecord, T>> lambda = Expression.Lambda<Func<IDataRecord, T>>(

        initializer, r);

    return lambda;

}

 

Expression<T> has a Compile method which produces an instance of T. In this case, T is our delegate type, Func<IDataRecord, T> , so we’re done! We build up a description of the logic using LINQ expressions, and then compile the logic to produce a delegate. This code is an order of magnitude easier to write using the expressions API than using custom IL generation (if you don’t believe me, give it a try!)

I can’t resist offering another implementation of the CreateDefaultShaper method that uses query expressions. This example nicely closes the loop, using LINQ to create LINQ expressions:

private static Expression<Func<IDataRecord, T>> CreateDefaultShaper<T>()

{

    ParameterExpression r = Expression.Parameter(typeof(IDataRecord), "r");

    return

        Expression.Lambda<Func<IDataRecord, T>>(

            Expression.MemberInit(

                Expression.New(typeof(T)),

                from property in GetWritableProperties<T>()

                let fieldMethod = typeof(Utility).GetMethod("Field")

                    .MakeGenericMethod(property.PropertyType)

                let propertyValue = Expression.Call(

                    fieldMethod, r, Expression.Constant(property.Name))

                select (MemberBinding)Expression.Bind(property, propertyValue)),

            r);

}

 

Problem:

Before signing off, I’d like to leave you with a simple problem: the CreateDefaultShaper method currently assumes that record instances have columns exactly corresponding to every property. Try adding support for column renames (e.g. the “customer_id” column maps to the “CustomerID” property). It’s probably easiest to implement this pattern with custom attributes on the materialized type properties, but it’s probably best to decouple the declaration of the mapping from its usage. This allows the mapping to be retrieved from an arbitrary location (e.g. either from CLR custom type attributes or from a separate configuration file).