Potential Performance Issues with Compiled LINQ Query Re-Compiles

When calling a query multiple times with Entity Framework the recommended approach is to use compiled LINQ queries. Compiling a query results in a performance hit the first time you use the query but subsequent calls execute much faster. The CompiledQuery class has functionality to compile and assist in caching the resulting database command from a query.

An example of a compiled query is show below. The query returns a list of Customers from the Northwinds sample database whose CustomerID starts with a specific string that is passed in as a parameter:

 static Func<NorthwindEntities, string, IQueryable<Customer>> compiledCustQuery =
   CompiledQuery.Compile((NorthwindEntities ctx, string start) =>
   (from c in ctx.Customers
    where c.CustomerID.StartsWith(start)
    select c));

A few weeks ago while working with a partner on EF performance optimization, we ran into an interesting case where the EF code segment was executing much slower than expected. The Visual Studio Profiler showed that the code was spending an inordinate amount of time when executing a compiled query. Digging into the code showed that one particular operation on the compiled query was causing it to recompile every time it was executed. The code was checking to see if the result set returned any records. There are multiple ways to accomplish this. A common way is to use .Count()!=0 or Any() . In this case the compiled query returned an IQueryable. Calling Count() or Any() was causing the query to re-compile.

I wrote a small application to duplicate this issue using the Northwinds database. The code executes the compiled query to see if any records are returned. If there are records in the result set, it executes the query again and returns all the rows (the ToList().Count() makes sure all results from the second query are materialized in memory). Below is a snippet of code showing the two queries:

 var qryAnyCust = compiledCustQuery(ctx, "C");

if (qryAnyCust.Any())
{
    var qryCust = compiledCustQuery(ctx, "C");
    qryCust.ToList().Count();
}

Running the Visual Studio profiler shows the following Hot Path:

clip_image001

Over 83.33% of the time is spent executing the System.Linq.Queryable.Any() class and only 16.67% of the time was spent actually executing the query. Executing the sample application yielded an execution time of approximately 9ms.

Next the code was modified to use a second version of the compiled query that contained the .Any() method in the query and returned a Boolean instead of an IQueryable. Below is the second query:

 static Func<NorthwindEntities, string, bool> compiledAnyCustQuery =
   CompiledQuery.Compile((NorthwindEntities ctx, string start) =>
   (from c in ctx.Customers
    where c.CustomerID.StartsWith(start)
    select c).Any());

The query execution code was changed to call the above query to check if any records would be returned and then the query from the previous example to retrieve the actual records. This code is shown below:

 if (compiledAnyCustQuery(ctx, "C"))
{
    var qryCust = compiledCustQuery(ctx, "C");
    qryCust.ToList().Count();
}

Looking at the profiler output of the above code shows that the call to Any() is no longer the bottleneck and that actually invoking the query now takes the majority of the time as expected:

clip_image003

Executing this code displayed much better performance. The test executed in only 6ms almost a 33% performance increase from a small change to a single query.

An alternative way to prevent re-compilation of the compiled query is to return an IEnumerable instead of IQueryable for the cached query. Using IEnumerable causes the query data to be returned from the server and then allows for evaluation and further refinement of the results set locally. For instance, executing a Count() on an IEnumerable from a query returns the number or records contained in the in-memory result set.

Note: Using IEnumerable will cause all of the query results data to be transferred from the server to the client. This can be a drain on memory and performance on the client if the query returns a large number of records. This might not be the best approach if you have large result sets and simply need to get record counts.

Below is the code from the sample application which returns the compiled query as an IEnumerable instead of IQueryable:

 static Func<NorthwindEntities, string, IEnumerable<Customer>> compiledEnumCustQuery =
   CompiledQuery.Compile((NorthwindEntities ctx, string start) =>
   (from c in ctx.Customers
    where c.CustomerID.StartsWith(start)
    select c));

A third version of the query execution, is show below, that checks to see if any records would be returned and then the query from the previous example to retrieve the actual records:

 if (qryEnumAnyCust.Any())
{
    var qryCust = compiledCustQuery(ctx, "C");
    qryCust.ToList().Count();
}

Executing the above code displayed similar performance to the previous test that used the compiled queries containing the Any() . The test executed in only 6.7ms which was much better than the original code. Profiling the IEnumerable test shows that now most of the execution time is spent in the Any() method on the IEnumerable which should execute quickly because the values are in memory:

clip_image004

In the case of our partner, changing their code to use compiled queries to contain the Any() and return a Boolean boosted their performance 450% because their code checked if queries returned results in multiple places.

Query recompile is not limited to Any() or Count() operations. It can happen with any LINQ operator that is applied on top of a compiled query. So be on the lookout for query re-compilation in your code. It can unknowingly impact performance. In a future post I will cover in more detail the use the Visual Studio Profiler to uncover bottlenecks in the Entity Framework.

Author: Dana Kaufman
Reviewers: Daniel Simmons, Diego Vega, Jaime Alva Bravo,  Peter Scharlock