VB expression trees - coalesce operator

Article
03/25/2008

Hey there! In this post, I'll continue describing some of the things that are interesting about the VB compiler, especially related to expression trees and the consumption of expression trees in your LINQ provider.

Again, this may not be too interesting if you aren't writing a LINQ provider, but I hope you read on anyway because I'll be discussing some of the subtle features of the language. I'm going to diverge far and wide, and I apologize up front if some of this is esoteric, but I want to give the appropriate background information.

Coalesce

One of the new operators introduced in VB is the coalesce operator. The coalesce operator looks like the ternary operator:

 REM Ternary operator
if(a, b, c)

REM Coalesce operator
if(a, b)

The coalesce operator allows you to return the value "a" if "a" is not nothing, and if "a" is nothing, then it will return the value "b". We typically use this operator if we don't know whether "a" has a value or not, and want to use a "default value" in the case where "a" has no value.

For example:

 Dim b = GetCityFromDatabase()
Dim city = if(b, "Seattle")

In this case, we get a city from a database using some API, and if there is no city in the database (the database contains a NULLABLE column and no value was stored there), then we'll use the default text "Seattle" as the city.

Ok, cool - but what does this have to do with expression trees?

The important thing to remember is this: in C#, the compiler generates the Coalesce operator node if the user explicit uses the coalesce operator (the ?? operator in C#). In VB, the compiler will generate a Coalesce operator node in some cases even if the user did not explicitly use the coalesce operator (if).

This means that even if you don't plan to support the coalesce operator, you still need to support it for the best VB experience possible.

Nullables

In order to explore this in the appropriate amount of detail, I have to diverge into VB's nullable semantics. In VB, the language design team decided to implement three-value nullable semantics rather than two-value nullable semantics (like C#).

What's the difference? Here's a table for VB for x OP y, where OP is any relational operator (<, <=, =, >=, >), and the type of x and y are not boolean (VB has special semantics for these operators if x and y are boolean).

x	y	RESULT
Not Nothing	Not Nothing	True/False
Not Nothing	Nothing	Nothing
Nothing	Not Nothing	Nothing
Nothing	Nothing	Nothing

In essence, if any of the operands are Nothing, then the result is nothing. In C#, you always get a true or false answer.

Let's just do a quick exercise:

 dim x as integer = 10
dim y as integer = 12
dim z as integer? = nothing
dim u as integer? = nothing

dim b1 = x < y
dim b2 = x < z
dim b3 = z < x
dim b4 = z < u

Here are the results:

b1 = true

b2 = nothing

b3 = nothing

b4 = nothing

What falls out of this is that the type returned by the logical operators is Boolean? , and not Boolean.

Conversions

In addition, a conversion from Boolean? to Boolean is a narrowing conversion. That is, if the Boolean? value is Nothing, converting it to Boolean will throw an exception.

So what are the implications?

If you have a method that takes a Boolean, and you pass it a Boolean? argument:

 sub bar(b as boolean)
end sub

bar(x < z)

Then if you have option strict on, you will get a compiler error. But if option strict is off, the compiler, as usual, applies narrowing conversions implicitly. In this particular case, you'll get a runtime exception.

This may or may not surprise you; but it falls out logically from the facts discussed above. Keep this in mind, because you're going to see this pop up again later when we discuss queries.

Conditional context

Ok, here's a quick quiz. What happens in the following code, with option strict on/off?

 if x < z then
end if

If you said that this code is an error when option strict is on and throws an exception when option strict is off (just like how it does in the above example with the method call to "bar"), you would be correct given what I've told you.

However, VB has a special mode called the conditional context mode. In certain cases, the compiler tries really hard to get a boolean out of an expression. We may do things like look for a CBool cast, or look for an IsTrue operator on your type. Or we may apply the coalesce operator if the expression is a nullable-type result (I knew I would get back to the coalesce operator eventually :)

And yes, you guessed correctly, the if statement is one of those places where we do this special work.

In the above example, the compiler generates code that's equivalent to the following:

 if if(x < z, false) then
end if

This almost looks like a syntax error! But what you are really doing is applying the coalesce operator to the expression x < z, and if that expression is nothing, then use the RHS (that is, false) as the result.

I hope everything so far has been clear. The compiler in spirit applies a coalesce operator in order to remove the potential nothing-ness from an expression safely, without the throwing a runtime error.

Queries

Another place where we apply the conditional context mode is in queries. In particular, the where clause has this mode applied. For example:

 dim q = from c in customers where x < z select c

This case is similar to the if; the compiler generates something like the coalesce operator in order to try to get a boolean result out of x < z, and so, this code does not throw an exception.

As a result, when the above where clause gets converted to an expression tree, we will generate a coalesce operator node to get the same semantics as in the non-expression tree case (I knew we were talking about expression trees!)

Aside: We recognize that generating a coalesce is inefficient for LINQ to SQL because that translates to a CASE statement in SQL - essentially, this means that every row in the table needs to be run through the query : therefore, no indexes are used, and the performance of such a query will be very slow.

For the relational operators, we can optimize the coalesce away by using a different form of the relational operator, but this won't work in scenarios that don't have a relational operator (such as a scalar Boolean? or cases where you funcletize a function that returns Boolean?). We decided that being consistent and providing the full VB semantic understanding was important, because there are more scenarios for expression trees then just LINQ to SQL.

Update: I held off posting this blog long enough to report that the LINQ to SQL team fixed their query analyzer for VB and will now recognize this pattern and generate optimal SQL so that you get full index search speeds :) This fix should be available in an update coming later this year.

The query scenario is exactly the scenario I described at the beginning of the article. This means that if you write a LINQ provider, you will almost invariably have to handle the coalesce node. This is because if your source is a database, you'll almost always have nullable columns in a database, meaning we will generate a tree with the coalesce node.

Extension methods

Before I close, I want to examine the case where you call the where operator in extension method format, rather than query format:

 dim q = customers.where(function(c) x < z)

Here's the final quiz: will this throw an exception when option strict is off? If you said yes, you totally rock :) Method calls, as I described above, do not fall under the conditional context group, and thus, this is as unsafe as before, when we called bar(x < z).

In this case, the expression tree that you get will not have a coalesce operator node, and thus, you'll get the unsafe, lifted version of the < operator, which results in a Boolean? result.

How do you get the same semantics in the extension method version? Use the coalesce operator:

 dim q = customers.where(function(c) if(x < z, false))

Now you know why we chose to generate the coalesce operator in the case of the where query operator: to make the code as consistent as if you, the user, wanted to apply the safe conversion from Boolean? to Boolean, mapping Nothing to False.

That is, the following query and extension method call is equivalent:

 dim q = from c in customers where c.id < 10) select c
dim q = customers.where(function(c) if(c.id < 10)).select(function(c) c)

Summary

I think I said a lot more about this subject then I intended to. I hope this all makes sense; there's definitely a lot of subtle things when you start to look at how various language features interact together.

The bottom line is that a good LINQ provider should handle the coalesce operator, and in the VB case, it's very important to handle the coalesce operator.

As always, if you have questions, feel free to send me email.

Technorati Tags: VisualBasic,VB9,LINQ,Coalesce,ExpressionTrees