Improvements to the Generated SQL in .NET 4.0 Beta1

Article
08/06/2009

The information in this post is out of date.

Visit msdn.com/data/ef for the latest information on current and past releases of EF.

One of the things that we are continuously working on improving is the quality and the readability of the SQL generated when querying using the Entity Framework. We have already made some improvements in .NET 4.0 Beta 1, and we are working on more for .NET 4.0 post Beta 1.

Most of the improvements we have made are in the Entity Framework query pipeline as opposed to specific SqlClient changes. Because the changes were in the query pipeline, this results in simpler output command trees and thus would affect the SQL generated by any backend, not only by our SQL Server provider.

Below we’ve highlighted the biggest improvements available in Beta 1 and then the ones that will be in the next public release of .NET Framework 4.0. Unless otherwise noted, the examples are based on the NorthwindEF Entity Data Model provided with the ADO.NET Entity Framework Query Samples (https://code.msdn.microsoft.com/EFQuerySamples). Also, in all the examples below, the parts of the SQL query highlighted in red have been removed and those in yellow have been added.

1. Avoid projecting all columns in nested queries

In some nested queries in the generated SQL, we used to explicitly project all columns that are brought in scope by the corresponding FROM clause, like the fragments marked red in the sample generated SQL below. Now, we instead only project the columns that are later referenced. This improvement is specific to our T-Sql Generation in SqlClient.

Example:

var

  query = (

from

  p in context.Products

orderby

  p.ProductName

select

  p.ProductName)
                 .Skip(2).Take(2);

2. Avoiding Unnecessary IS NULL Checks in LINQ to Entities Queries

When translating joins in LINQ to Entities, we check whether the columns on which the join is specified are equal or both are null. In the cases when one (or both) of the columns is non-nullable, the “IS NULL” check is redundant and has been removed. This improvement is in the Entity Framework query pipeline.

Example:

var

  query = context.Products.Join(
context.Categories, 
p => p.ProductID,
c => c.CategoryID,
(p, c) =>

new

 { p.ProductID, c.CategoryID, p.ProductName, c.CategoryName });

3. Additional join elimination: Eliminate parent table in “Child Left Outer Join Parent”

In scenarios where a given table is explicitly or implicitly (e.g. via a navigation property) joined to a parent table (i.e. table to which it has a foreign key constraint) we were not eliminating the parent table even if the only columns referenced from that table were the columns comprising the primary key and could have been reused from the child table. (Note: The parent-child terminology assumes that there is foreign key relationship between these tables specified in the SSDL). This improvement is in the Entity Framework query pipeline.

Example:

Note: For this example the highlighted following fragments need to be added to the SSDL in the ADO.NET Entity Framework Query Samples noted above:

 string entitySQL = "SELECT p.Category.CategoryId FROM Products as p";

For LINQ to Entities in .NET Framework 3.5 SP1 the equivalent query resulted in the same generated query for Entity SQL shown in the After sample.

4. Using IN instead of nested OR’s

Comparing a column to multiple values, either as a result of the explicit use of Entity SQL’s IN expression or as result of internally generated checks over a type discriminator value used to result in nested OR expressions in the generated SQL. Now, we are instead producing an IN expression. This improvement is specific to our T-Sql Generation in SqlClient.

For example:

 string entitySQL = "SELECT p FROM Products as p where p.ProductId in {1, 2, 3, 4, 5}";

Starting with.NET 4.0 Beta 1 Entity Framework, a LINQ version of this query is also supported:

var

  query = context.Products.Where(p =>

new int

 [] { 1, 2, 3, 4, 5 }.Contains(p.ProductID));

The resulting generated TSQL also takes advantage of this improvement:

5. Translating more of LINQ’s GroupBy operator into GROUP BY

LINQ’s GroupBy operator is richer then SQL’s group by clause. For example, in addition to aggregates over a group it can return the entire group. When translating LINQ’s GroupBy, we try to recognize if it can be expressed by SQL’s group by (i.e. DbGroupByExpression without a group partition), otherwise we translate it into a more complex expression involving a join to the input table.

In Entity Framework 4.0 Beta1 we have expanded the cases that we are able to translate into SQL’s GroupBy. This improvement is in the Entity Framework query pipeline.

Example:

var

  query = context.Products.Where(p => p.ProductID < 4)
            .GroupBy(p => p.ProductID, p => p.ProductID, (key, group) => 
            new { Key = key, Max = group.Max() });

6. Avoiding “Cast (1 as bit)”, using “1” instead

In cases when we need an internally generated constant, like when translating EntitySQL Exists expression or IEnumerable.Count in LINQ to Entities, we previously used “true”, which translates to “cast(1 as bit)” on SQL Server. We now instead use the integer constant 1, which translates to “1”. This improvement is in the Entity Framework query pipeline.

var

  query = context.Categories.Select(c => new { c.CategoryID, count = c.Products.Count() });

7. Simplifying some queries that go against a subtype in a type hierarchy

In some cases when a query is only interested in entities of a particular subtype (specified via OfType or IsOf for example) we used to query using a view over the base type and thus possibly generate some unnecessary case statements to also check for the types that are not of the desired subtype. This improvement allows us to use the simplified query view that is only over the desired subtype and thereby generate a simpler provider query.

The following example that illustrates that is over a schema with a TPH mapping with the inheritance hierarchy as shown in the figure. This improvement is in the Entity Framework query pipeline.

var

  query = this.tph.Entities.OfType<Tph.

Derived1

 >().GroupBy(d2 => d2).Select(p => p.Max(g => g.Id));

Improvements in .NET 4.0 post Beta1

Here is a quick overview of the improvements that we are working on for this release but are not included in Beta1:

Elimination of null sentinels – In the output query there is often the constant 1 projected both in nested and in top-most queries. By avoiding it and reusing other columns for the same purpose in many cases we are able to avoid levels of nesting.
Additional Join Elimination
Use of Inner Joins instead of Left Outer Joins when possible
Provide mechanism for efficient queries on non-Unicode columns – We now generate non-unicode constants and parameters when these are used in LINQ to Entities queries in comparisons with non-unicode columns. This allows indexes to be used by SQL Server.
Improve the translation of the functions String.StartsWith, String.Contains and String.EndsWith in LINQ to Entites to use LIKE.
Improve the translation of the canonical functions StartsWith, Contains and EndsWith to use LIKE (these canonical functions became available in .NET 4.0 Beta1)
Collapse multiple Case statements into one

Kati Iceva
Software Development Engineer, Entity Framework

Improvements to the Generated SQL in .NET 4.0 Beta1

1. Avoid projecting all columns in nested queries

2. Avoiding Unnecessary IS NULL Checks in LINQ to Entities Queries

3. Additional join elimination: Eliminate parent table in “Child Left Outer Join Parent”

4. Using IN instead of nested OR’s

5. Translating more of LINQ’s GroupBy operator into GROUP BY

6. Avoiding “Cast (1 as bit)”, using “1” instead

7. Simplifying some queries that go against a subtype in a type hierarchy

Improvements in .NET 4.0 post Beta1

Additional resources