The DLinq Dilemma: SQL shall be SQL, and C# shall be C#, and never the twain shall meet

Article
07/17/2006

With a tip of a hat to Rudyard and his ballad, I would say

Oh SQL is SQL and C# is C#, and never the twain shall meet
Till rows and objects manually a dev has both to beat
But there is neither SQL nor C#, nor join nor dot
When LINQ stands to link the twin what magic have I got

Ballads aside, this is a real problem that we are grappling with. There is really nothing new about this problem - it is the problem of Object Relational Mapping (ORM). The two worlds are rules by languages and run-times that have different semantics. The problem is less acute when you are providing a window to the other through classic API and string query language based ORM components. It is more important when you are truly bridging the two with language integration - that is what LINQ does - with its LINQ to SQL (aka DLinq) component. Here is the first in the series and the most controversial one - null comparison.

In SQL, a = b is not true when a and b are both null (special settings aside).
In C#, a == b evaluates to true when a and b are both null.

(Asides:
1. I am talking about the behavior of shipped products here - not LINQ design choice so far. And I am not going to reopen the debate about null semantics in C# here. You can see posts in my blog and my colleagues' blog for background.
2. VB has a different semantics so this post is C#-specific)

So in the LINQ project, we have to pick our translation from C# to SQL. Consider a simple query:

from c in db.Customers
from s in db.Suppliers
where c.City == s.City
select new {c.CustomerID, s.SupplierID};

If we go with the C# behavior, SQL users moving to LINQ will be surprised.
If we pick SQL behavior then C# users may be surprised. Worse still, those who run the same query with data moved to in-memory collections will get different results if c.City and s.City happen to have null values.

In previous CTPs, we chose the SQL way. So the generated query looks like.

SELECT [t0].[CustomerID], [t1].[SupplierID]
FROM [Customers] AS [t0], [Suppliers] AS [t1]
WHERE [t0].[City] = [t1].[City]

Of course, we could have done a different translation for higher semantic fidelity with C# by changing the comparison to
WHERE t0.City = t1.City
OR (t0.City is null and t1.City is null)

Under this proposal, a user can still get SQL semantics by changing predicate in LINQ query to
c.City != null && c.City == s.City

So here are my questions:
1. Did you even notice the difference between C# and SQL semantics?
2. Which one do you prefer and why? (FWIW, we got no questions/comments about this as far as I remember - zip, zilch, nada)

There are more manifestations of null and other semantics but more about them and our reasons for the current choices in subsequent posts.

The DLinq Dilemma: SQL shall be SQL, and C# shall be C#, and never the twain shall meet

Additional resources