Performance Quiz #13 — Linq to SQL compiled query cost — solution


Well is there really a “solution” at all in general?  This particular case I think I constrained enough that you can claim an answer but does it generalize?  Let’s look at what I got first, the raw results are pretty easy to understand.


The experiment I conducted was to run a fixed number of queries (5000 in this case) but to break them up so that the compiled query was reused a decreasing amount.  The first run is the “best” 1 batch of 5000 selects all using the compiled query.  Then 2 batches of 2500, and so on down to 5000 batches of 1.  As a control I also run the uncompiled case at each step expecting of course that it makes no difference.  Note the output indicates we selected a total of 25000 rows of data — that is 5 per select as expected.  Here are the raw results:



Testing 1 batches of 5000 selects
5000 selects uncompiled 9200.0ms 25000 records total 543.48 selects/sec
5000 selects compiled 5401.0ms 25000 records total 925.75 selects/sec


Testing 2 batches of 2500 selects
5000 selects uncompiled 9181.0ms 25000 records total 544.60 selects/sec
5000 selects compiled 5402.0ms 25000 records total 925.58 selects/sec


Testing 5 batches of 1000 selects
5000 selects uncompiled 9169.0ms 25000 records total 545.32 selects/sec
5000 selects compiled 5432.0ms 25000 records total 920.47 selects/sec


Testing 100 batches of 50 selects
5000 selects uncompiled 9184.0ms 25000 records total 544.43 selects/sec
5000 selects compiled 5511.0ms 25000 records total 907.28 selects/sec


Testing 1000 batches of 5 selects
5000 selects uncompiled 9166.0ms 25000 records total 545.49 selects/sec
5000 selects compiled 6526.0ms 25000 records total 766.17 selects/sec


Testing 2500 batches of 2 selects
5000 selects uncompiled 9165.0ms 25000 records total 545.55 selects/sec
5000 selects compiled 7892.0ms 25000 records total 633.55 selects/sec


Testing 5000 batches of 1 selects
5000 selects uncompiled 9157.0ms 25000 records total 546.03 selects/sec
5000 selects compiled 10825.0ms 25000 records total 461.89 selects/sec


And there you have it.  Even at 2 uses the compiled query still wins but at 1 use it loses.  In fact, the magic number for this particular query is about 1.5 average uses to break even.  But why?  And how might it change?


Well, as has been observed in the comments, Linq query compilation isn’t like regular expression compilation.  In fact compiling the query doesn’t do anything that isn’t going to have to happen anyway.  In fact, actually creating the compiled query with Query.Compile hardly does anything at all, it’s all deferred until the query is run just as it would have been had the query not been compiled.  So what is the overhead?  Why is it slower at all?  And what’s the point of it?


Well the main purpose of that compiled query object is to have an object, of the correct type, that also has the correct lifetime.  The compiled query can live across DataContexts, in fact it could potentially live for the entire life of your program.  And since it has no shared state in it, it’s thread-safe and so forth.  It exists to:


1) Give the Linq to SQL system a place to store the results of analyzing the query (i.e. the actual SQL plus the delegate that will be used to extract data from the result set)


2) Allow the user to specify the “variable parts” of the query.  The most common case isn’t that the query is exactly the same from run to run, usually it’s “nearly” the same… That is it’s the same except that perhaps the search string is different in the where clause, or the ID being fetched is different.  The shape is the same.  Creating a delegate with parameters allows you to specify which things are fixed and which things are variable.


Now there was some debate about how to make compiled queries durable, automatically caching them was considered, but this was something I was strongly against.  Largely because of the object lifetime issues it would cause.  First, you would have to do complicated matching of a created query against something that was already in the cache — something I’d like to avoid.  Secondly you have to decide where to store the cache, if you associate it with the DataContext then you get much less query re-use because you only get a benefit if you run the same query twice in the same data context.  To get the most benefit you want to be able to re-use the query across DataContexts.  But then, do you make the cache global?  If you do you have threading issues accessing it, and you have the terrible problem that you don’t know when is a good time to discard items from the cache.  Ultimately this was my strongest point, at the Linq data level we do not know enough about the query patterns to choose a good caching policy, and, as I’ve written many times before, when it comes to caching good policy is crucial.  In fact, analogously, we had to make changes in the regular expression caching system back in Whidbey precisely because we were seeing cases where our caching assumptions were resulting in catastrophically bad performance (Mid Life Crisis due to retained compiled regular expressions in our cache) —  I didn’t want to make that mistake again.


So that’s roughly how we end up at our final design.  Any Linq to SQL user can choose how much or how little caching is done.  They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a static variable forever) or a complicated recycling method depending on their needs.  Usually the simple choice is adequate.  And they can easily choose which queries to compile and which to just run in the usual manner.


Let’s get back to the overhead of compiled queries.  Besides the one-time cost of creating the delegate there is also an little extra delegate indirection on each run of the query plus the more complicated thing we have to do: since the compiled query can span DataContexts we have to make sure that the DataContext we are being given in any particular execution of a compiled query is compatible with the DataContext that was provided when the query was compiled the first time.


Other than that the code path is basically the same, which means you come out ahead pretty quickly.  This test case was, as usual, designed to magnify the typical overheads so we can observe them.  The result set is a small number of rows, it is always the same rows, the database is local, and the query itself is a simple one.  All the usual costs of doing a query have been minimized.  In the wild you would expect the query to be more complicated, the database to be remote, the actual data returned to be larger and not always the same data.  This of course both reduces the benefit of compilation in the first place but also, as a consolation prize, reduces the marginal overhead.


In short, if you expect to reuse the query at all, there is no performance related reason not to compile it. 

Comments (18)

  1. Peter Ritchie says:

    I, for one, would be interested in seeing the code associated with this. — Thanks

  2. Peter Ritchie says:

    I think the results that others have posted in the previous post and their differences shows that it depends on the query, the database and how you implement it.  Whether or not to use (or not use) a compiled query should be based on observed metrics.

  3. ricom says:

    It’s just two nested loops around the queries I gave in the problem statement.  The timer goes around the whole thing.  Couldn’t be any dumber.

    start timer

    for (j=0;j<batches;j++)

      compile query

      for (i=0;i<runs;i++)

        create new data context

        run query in new context

    stop timer

    print number of selects and time etc.

    batches*runs always = 5000

    In the uncompiled case it really doesn’t matter how many in each batch because the same code runs the same number of times plus or minus some loop overhead.

    start timer

    for (j=0;j<batches;j++)

      for (i=0;i<runs;i++)

        create new data context

        run query in new context

    stop timer

    print number of selects and time etc.

  4. ricom says:

    Peter writes: “I think the results that others have posted in the previous post and their differences shows that it depends on the query, the database and how you implement it.  Whether or not to use (or not use) a compiled query should be based on observed metrics.”

    Well you’ll never get me to disagree with “you should make your choice based on observed metrics.”  

    I think what you’ll find is that you break even very easily so you’ll end up making your choice on the basis of:  

    1) is there any re-use at all

    2) how much uglier is the compiled code pattern and is the benefit worth the hassle

    The more complicated the query, the greater the benefits of not having to re-analyze it.  

    On the other hand, slower connections and/or more data processing will tend to make all the Linq overheads smaller by proportion, to the point where it could all be moot.

    But the point here is to illustrate that, unlike other compilation methods, this one has very modest overhead because the compiled path is nearly the same as the uncompiled path.  There isn’t a seperate “compilation” vs. “interpretation” process like in say regular expressions.

    Or, said yet another way, you actually always compile the query.  The only question is, do you save the results of compilation or not?

  5. Brian says:

    Umm, what does the "In fact, the magic number for this particular query is about 1.5 average uses to break even." tell me? Isn’t that discrete, you can either run it once or twice. How do I run this query 1.5 times? The 1.5 average is pretty useless in this case.

  6. Paul says:

    It tells you that when your code is executed, if the average number of times a particular query is run is >1.5, then you should consider using compiled queries.

  7. Peter Ritchie says:

    RE: "…break even very easily…"

    This has to do with (I’m assuming) having to change the code from being used uncompiled to compiled (and/or vice versa)…

    Unfortunately the complexity of anonymous types and the way LINQ is compiled into IL makes it really hard to separate the query into it’s own concern.  I.e. it’s hard, and not as efficient as compiled directly, to get a LINQ query into a delegate that you could then use either to compile/execute or just execute.  So, it’s hard to push whether a LINQ statement is compiled or not into a policy that is chosen at runtime.

    Ideally you would do something like this:

    MyClass myClass = new MyClass(new CompileLinqStrategy());

    //…

    // somewhere in MyClass:

    Expression<Func<DataContext, SomeType>> expression = (NorthwindDataContext nw) => (from …);

    var results = linqStrategy.Invoke(expression);

    foreach(var result in results)

    {

    // do something with result

    }

    CompiledLinqStrategy and UncompiledLinqStrategy would simply execute the expression (CompiledLinqStrategy would compile it first, of course).

    Whether or not the expression is compiled or not has been completely separated from the code that uses the LINQ statement and therefore the LINQ statement need not change to switch from uncompiled to compiled and the problem of the code breaking due to this change is eliminated.

    Unfortunately the C# compiler compiles a LINQ statement into much more efficient IL code than you could write manually to execute the expression, so my guess is you’d end up eating most of the performance gains.

    …we need a memberof (and infoof) in C#…

  8. Chabster says:

    > the main purpose of that compiled query object is to have an object, of the correct type, that also has the correct lifetime

    If there was some mechanism to reference anonymous types like this:

    Expression<Func<?>> expr = from o in nw.Orders select new { OrderID = o.OrderID, CustomerID = o.CustomerID, EmployeeID = o.EmployeeID, ShippedDate = o.ShippedDate };

    there would be no need for compilation you think?

    And what is the difference between CompiledQuery.Compile and the following:

    public static Func<TArg0, TResult> MyCompile<TArg0, TResult>(Expression<Func<TArg0, TResult>> query) where TArg0 : DataContext {

    return(query.Compile());

    }

    var myQuery = MyCompile((Northwind nwDb) => (from o in nwDb.Orders select new { OrderID_C = o.OrderID, CustomerID = o.CustomerID, EmployeeID = o.EmployeeID, ShippedDate = o.ShippedDate }));

    I suppose where TArg0 : DataContext restriction might be very important! It allows to introspect TArg0 and emit direct code to manipulate it!

  9. Blog says:

    LINQ query versus compiled LINQ query

  10. LINQ query versus compiled LINQ query

  11. Paul says:

    >Any Linq to SQL user can choose how much or how little caching is done.  They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a static variable forever) or a complicated recycling method depending on their needs.

    Rico, can you give us an example of how to control the lifetime of this particular compiled query? The resulting compiled query is an anonymous type. This generally limits its usage to the scope of the method that created it. It seems that the caveat is that there’s a lot less flexibility in controlling the lifetime of a compiled query based on anonymous types.

  12. ricom says:

    You have to use the more formal syntax and store the thing in a static or something like that.  Or store it as a member of some other helper type with suitable lifetime.

    In this case I was able to use the simpler ‘var’ syntax because I had a synthetic lifetime. Normally you don’t get such a luxury.

    I think you could stuff it in a local, using var, then use intellisense to see the exact type, and make something of that type for instance.

  13. michael.plavnik says:

    Rico, I was itereseted in performance of compiler generated code against IQueryable<>. So I wrote simplistic provider that creates new queryable every time. Query buildup (no query translation) below takes about 70 microseconds on my Intel Core2 6600@2.4GHz machine. It is roughly equivalent to the time it takes to query first 10 orders from Nortwind database on the same machine with SqlDataReader.

    var q = from a in source

       group a by a[0] into g  where g.Key == ‘a’

       from x in g  select x;

    10,000 queries top per processor would seem like very small number for high end data solutions, isn’t it?! There is also an issue of holding transaction open for longer then necessary, which then need to be carefuly avoided in code.

    Better support for compiled queries looks like a must have.

  14. 10 Tips to Improve your LINQ to SQL Application Performance

  15. If you are using Linq to SQL, instead of writing regular Linq Queries, you should be using Compiled Queries

  16. If you are using Linq to SQL, instead of writing regular Linq Queries, you should be using Compiled Queries

  17. 吴明浩 says:

    Heythere,backagain.InmyfirstpostaboutLINQItriedtoprovideabrief(okay,bitdetailed)in…

Skip to main content