This week at DevConnections in Orlando, I gave a “deep-dive” talk on LINQ. I wanted to give people a feel for what's possible with the new language features and core APIs in .NET 3.5. I spent most of the talk discussing a single example: take an ADO.NET 2.0 code sample and simplify. Instead of using an existing library like LINQ to SQL or LINQ to Entities, I built a (limited) LINQ provider from scratch. As boilerplate code is moved into helper methods and a proto-LINQ-to-SQL API evolves, the sample is boiled down to something much more compact:
In this post, I’ll drill down on one component developed in the talk, which was also included in the EFExtensions library. This component takes rows from a data reader and transforms them (or “shapes” them) into typed results. For instance, I may want to transform records into customers. Using a couple of extension methods (Field<T> and Materialize<T>), we can leverage a “shaper” delegate to do just this:
The shaper delegate code (highlighted above) is still annoying though: for every property of the customer, I’m retrieving a column of the same name. The code is mechanical and repetitive, the sort of thing you want a machine to do rather than a programmer.
Here are three strategies you can use to automate this pattern in .NET 2.0:
1. Reflection: Write a general purpose delegate that uses reflection to construct an instance of T and then dynamically invokes property setters. While this code is relatively easy to write, the performance is sub-optimal. For information on the performance of various method dispatch patterns, take a look at this great talk by Joel Pobar and Joe Duffy.
2. Automatically generate the code: You can automatically generate wrapper classes encapsulating shaping logic. Code generation has its challenges however, in particular integration with build systems and the development environment. It’s also a lot of code to maintain.
3. Create a DynamicMethod implementing the pattern: .NET allows you to compile a delegate at runtime using dynamic methods. This resolves the performance and maintenance problems of solutions 1 and 2. MSIL generation is hard to get right unfortunately and the code does not reflect the intent of the generated delegate.
Digression: random thoughts on APIs
I recently heard the expression “Swiss Army Knife API”. These are interfaces that handle all kinds of little (possibly unrelated) problems. They can be useful but they are also hard to package and discover. The Zip method in EFExtensions illustrates these problems. It’s a handy method, but what is it doing in an EF library? It has nothing to do with the EF or with the scenarios addressed by the library (it’s poorly packaged), and no one trying to pair the elements of two iterators would think to look in that particular library (it’s not discoverable). If you need to install a PCI card, fillet a fish and hand-stitch a saddle, you might need a Swiss Army API.
At the other extreme, there are narrowly targeted APIs that can solve complex problems but most often require specialized knowledge or training. To make matters worse, once you’ve mastered them, you can rarely apply your knowledge to different domains. You can probably think of a few examples of this pattern.
LINQ achieves a useful balance. While the LINQ project was motivated by a specific requirement – seamless support for non-object data within .NET applications – all components of the solution are generically useful. Consider…
The System.Linq.Expressions API serves a specific need for integrated queries: it allows the compiler to describe the user’s code as a data structure that can then be translated to targets other than MSIL at runtime, like SQL, Web Services, etc. Expressions can also be compiled into delegates at runtime, which brings me to a .NET 3.5 solution to the default shaper problem… If the compiler can use expressions to describe code, so can we!
Here’s the code pattern we want to generate:
Shortcut: learning how to build an expression programmatically
If you want to figure out how to build expressions programmatically, a simple trick will probably save you some time. Just follow the compiler’s lead. First, write an example of the pattern, e.g.:
After compiling your program, use a tool like .NET Reflector to figure out how the compiler builds the expression tree. In its default mode, this tool is actually a little bit too smart, disassembling the code into precisely what we wrote to begin with. If you change the disassembler settings (View à Options à Disassembler) to use “.NET 2.0” optimizations, it will give more useful output, e.g.:
Once you’ve figured out the expression pattern, you can implement a general solution, e.g.:
Expression<T> has a Compile method which produces an instance of T. In this case, T is our delegate type, Func<IDataRecord, T>, so we’re done! We build up a description of the logic using LINQ expressions, and then compile the logic to produce a delegate. This code is an order of magnitude easier to write using the expressions API than using custom IL generation (if you don’t believe me, give it a try!)
I can’t resist offering another implementation of the CreateDefaultShaper method that uses query expressions. This example nicely closes the loop, using LINQ to create LINQ expressions:
Before signing off, I’d like to leave you with a simple problem: the CreateDefaultShaper method currently assumes that record instances have columns exactly corresponding to every property. Try adding support for column renames (e.g. the “customer_id” column maps to the “CustomerID” property). It’s probably easiest to implement this pattern with custom attributes on the materialized type properties, but it’s probably best to decouple the declaration of the mapping from its usage. This allows the mapping to be retrieved from an arbitrary location (e.g. either from CLR custom type attributes or from a separate configuration file).