LINQ to SQL (and LINQ to Entities) performance improvements (Tim Ng)

Over the last few months, the VB and Data Programmability teams were working on fixing a performance problem with LINQ to SQL (which also manifested in LINQ to Entities). The issue was that LINQ to SQL was generating sub optimal T-SQL queries when the VB LINQ queries included filters over nullable columns.

For example, consider the following query:

Dim q = From o In db.Orders Where o.EmployeeID = 123 Select o.CustomerID 

In this scenario, the Orders.EmployeeID field is a nullable field (Integer?). In VB, the logical operators (<. <=, =, etc) are considered “three-value” operators, and thus, the result of the equality comparison is a “Boolean?” however, the LINQ operators expect the predict to return “Boolean” and not “Boolean?” therefore, VB must convert the “Boolean?” to a “Boolean” using the Coalesce operator. (For more details, please see my blog post on Expressions trees and Coalesce ).

Although LINQ to SQL could optimized out the Coalesce operator, in VS 2008 RTM it converted from a predicate logic to value logic (and vice versa) which in some cases blows the applicable index on the column.

Here’s an example for the SQL that is generated in VS 2008 RTM:


SELECT [t0].[CustomerID]
FROM [dbo].[Orders] AS [t0]
        WHEN [t0].[EmployeeID] = @p0 THEN 1
        WHEN NOT ([t0].[EmployeeID] = @p0) THEN 0
        ELSE NULL
     END),@p1)) = 1
— @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [123]
— @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]

The Good News

The good news is that the LINQ to SQL team has a fix to their code generation to recognize this pattern, and now they emit the SQL code that you would have expected to emit:

SELECT [t0].[CustomerID]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[EmployeeID] = @p0
— @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [123]

In other words, LINQ to SQL (and LINQ to Entities – we made the same fix there) is now smart enough to pass three-value logic from VB to SQL.

In this particular case, both VB and SQL employs three-value boolean logic, but the intermediate layer (the LINQ operators), employed two-value boolean logic, and the translation between the layers was “lost”, which resulted in the non-optimized code that we generated in VS2008 RTM. Look forward to this in an update coming later this year.