Having Trouble with Queries

A Funny Joke and a Sad Joke

There is a joke that goes something like this:

Two men are hopelessly lost in hot air balloon.  Their condition is aggravated by the fact that they are enshrouded by a thicket of fog.  When it seems all is lost, suddenly the fog parts and miraculously a man is standing on the ground below.

They shout, "Hey sir!"  The man looks up a little bewildered by the sudden appearance of a hot air balloon.

Their hearts take courage and they ask, "Do you know where we are?"

The man on the ground thinks for a moment as the two men above wait in desperation.

Finally, the man looks up as the fog begins to engulf him again and says, "You are in a hot air balloon."

The two men slump into the hot air balloon as their last hope slips away.  One of them says to the other, "He must have been a professor."

The other says, "Why do you say that?"

The first responds, "Because what he said was totally correct but absolutely useless."

The joke has several variants with lawyer, teacher, engineer, etc. substituted for professor.  This joke's punch line also is a very good moral:  Don't annoy users by telling them something that is correct but useless.  By useless, I mean of no practical value which can include things that are incomprehensible to the average user.

Lets change the story a little bit:

Two programmers are hopelessly behind on an important software project.  Their condition is aggravated by the fact that they are having trouble learning to use the new query feature in C# 3.0.  They write the following query:

  var q = from c in Customers
where c.State == "WA"
select c.ToStrin(); // c.ToStrin() should be c.ToString()

It doesn't seem quite right but they don't know what they did wrong.

They say to each other, "Maybe the compiler's error message will be helpful and give a hint or two at how to use the language feature correctly."

Their hearts take courage and they look at the error message pane in Visual Studio in anticipation thinking to themselves, "What mistake did we make when writing our query?"

The compiler computes for a moment as the two programmers wait in desperation.

Finally, the compiler spits out an error message which says, "The type arguments for method 'System.Linq.Queryable.Select<TSource,TResult>(System.Linq.IQueryable<TSource>, System.Linq.Expressions.Expression<System.Linq.Func<TSource,TResult>>)' cannot be inferred from the usage. Try specifying the type arguments explicitly."

The two programmers slump into their chairs as their last hope slips away.  One of them says to the other, "Those C# compiler guys must be studying the C++ template error messages and the ML type inference error messages."

The other says, "Why do you say that?"

The first responds, "Because what it said was totally correct but absolutely useless."

Error messages are tricky to get right and for some reason compilers often have some of the most cryptic error messages.  To be fair, the primary purpose of a compiler is to correctly translate a correct program and to reject incorrect programs.  But to paraphrase a common saying, computer programming languages are primarily for humans and only incidentally for machines.  Maybe the error messages are cryptic because it is a tool written by programmers for programmers or possibly because the push to have more language features or better optimizations outweighs the demand for quality error messages.  Whatever the reason is, compilers tend not to be very helpful in providing good error messages.

The C# compiler team devotes a lot of thought and work to getting error messages right.  Eric Lippert, one of my teammates, has a great post about error messages (apparently GrantRi made one too).  Early on in C# 3.0, we focused on getting the language features complete (100% spec compliant) and sometimes we sacrificed performance or the quality of error messages to achieve this.  Now later in the product cycle we are focusing on things like the quality of the error messages that we generate, the performance of the C# compiler, and the performance of the generated IL.

The Challenges of Getting It Right

C# 3.0 presents a bunch of new challenges to our goal of getting our users the right product.  The language features are much further from the IL that is generated than they ever have been (try writing a small app with a handful of queries then open the compiled assembly in ILDASM or Reflector and you will see what I mean).  What all of this amounts to is that the distance from when an error introduced in compilation and when it detected has increased making it more difficult to produce very good error messages.  Furthermore, C# 3.0 uses type inference quite a bit which makes it even more difficult to produce good error messages.  A glance at a few languages that use type inference extensively will show you that there currently seems to be a tradeoff between the power of a type inference algorithm and its ability to produce good error messages.

Consider when an error occurs in a Linq to Objects query (c.Lastname should be c.LastName):

var q = from c in Customers
join o in Orders on c.ID equals o.CustomerID
where c.FirstName == "Wes"
orderby o.Date
select new { c.Lastname, o };

When the query is compiled using an early production compiler, an error message like the following would occur:

The type arguments for method 'System.Linq.Queryable.Select<TSource,TResult>(System.Linq.IQueryable<TSource>, System.Linq.Expressions.Expression<System.Linq.Func<TSource,TResult>>)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

Which probably will only serve to evoke a scratch of the head and an indefinite, "Hmm...".  This error message is bad for a number of reasons.  First what is this talk about type arguments, I don't see any generics in sight.  Second, why is the error referring to Queryable when the query is clearly an Enumerable query.  Third, what is this Select method that is being referred to?  The query doesn't make a single method call.  Fourth, why does the error talk about expression trees when the query didn't even have a lambda in it?  It gets even worse if the span of the query is considered.  The span for this particular query is the "select" query contextual keyword.  Is that where the error occurred?  That doesn't seem likely.

It would be so much better if the error message was just

'Customer' does not have a member named 'Lastname'.

where the span for the error was "c.Lastname" in the select clause.  So why not just produce this error message and why produce the first error message?  Believe it or not, the first error message actually makes sense if you think about it really hard (no really, it's true!).  The error message refers to type arguments and the Select method because the select clause is translated into a call to a Select method which in this case is a generic method.  The reason why it refers to Queryable is because the methods that are candidates are all extension methods and one of those extension methods has a Queryable receiver.  Finally, the expression in the select clause becomes the body for a generated lambda that is passed to the Select method, but this lambda does not explicitly specify the types of its arguments so they must be inferred.  The problem is that no types can be inferred for the arguments such that the body does not have errors and hence we get the type inference errors.  I know, I know this still doesn't excuse the error message.  While the error message is correct, it is not very helpful.  Especially considering that the user should not need to know how queries are translated or how lambdas work or how type inference works and so on.

A Deeper Look

During our third milestone, we did work to make this error message better.  It was the worst of all of our error messages.  I can't find words adequate to describe how painful it was to fix very large queries that had simple errors.

The root of the problem lies in producing good error messages for lambdas even though the lambda may be bound several different ways through type inference.  Consider the following program:

using System.Linq;

class Program
{
static void Main(string[] args)
{
M(x => x.FooM()); // case 1
M(x => x.BarM()); // case 2
M(x => x.BazM()); // case 3
M(x => x.FooM() + x.BarM()); // case 4
}

  static void M(Func<Foo, Foo> f)
{
}

  static void M(Func<Bar, Bar> f)
{
}
}

class Foo
{
public Foo FooM() { return this; }
}

class Bar
{
public Bar BarM() { return this; }
}

In each call to M, a lambda is passed as the only parameter. In each lambda, x could have the type Foo or the type Bar since there are two methods named M which take a parameter type of Foo and a parameter type of Bar respectively. In case 1, if x has type Foo then the lambda binds correctly but if it has type Bar then an error occurs so x must have type Foo. In case 2, we can use the same logic to show that x must have type Bar. In case 3, the lambda has an error whether x is bound having type Foo or type Bar so case 3 is an error. In case 4, we have the same as case 3. So both case 3 and case 4 produce cryptic error messages.

Let's look at cases 3 and 4 closely.  In case 3, when we bind the lambda with type Foo we get one error:

test.cs(9,18): error CS1061: 'Foo' does not contain a definition for 'BazM' and
no extension method 'BazM' accepting a first argument of type 'Foo'
could be found (are you missing a using directive or an assembly
reference?)

When we bind the same lambda with type Bar we also get one error:

test.cs(9,18): error CS1061: 'Bar' does not contain a definition for 'BazM' and
no extension method 'BazM' accepting a first argument of type 'Bar'
could be found (are you missing a using directive or an assembly
reference?)

Notice the similarity between the two error message?  They both have the same error number and the same span.  Essentially, the same program text caused the error and both are the same error class.  They only differ in the parameters that are used to produce the error text ("Foo" vs "Bar").

In case 4, when we bind the lambda with type Foo we get one error:

test.cs(10,29): error CS1061: 'Foo' does not contain a definition for 'BarM' and
no extension method 'BarM' accepting a first argument of type 'Foo'
could be found (are you missing a using directive or an assembly
reference?)

And when we bind the lambda with type Bar we get one error:

test.cs(10,18): error CS1061: 'Bar' does not contain a definition for 'FooM' and
no extension method 'FooM' accepting a first argument of type 'Bar'
could be found (are you missing a using directive or an assembly
reference?)

This time we have two errors with the same error number but different spans.  The difficulty of dealing with the possibly multiple bindings of lambdas and the potentially (very) large number of error messages led to the incomprehensible error message.  The sad part is that those more helpful error messages are being hidden (maybe swallowed is a better term) by the more cryptic error.  But how can we let them shine through?

Fixing the Problem

We can solve this by introducing two sets of error messages: the union of error messages and the intersection of error messages.  Each binding of a lambda produces a set of error messages.  In the first such binding of error messages we initialize both the union and intersection sets to be the set of errors produced by the binding.  On each subsequent binding we refine the sets by either taking the union of the binding error message set with the union set or the intersection of the binding error message set with the intersection set.  After trying all the error messages, if the intersection set is not empty then we report only the error messages from that set.  Otherwise, we report all of the errors from the union set.  For the purposes of set semantics, we define two error messages to be the same iff they have both the same span and the same error number (not necessarily that they have the same text).

The intuition behind this is that the intersection set of errors are those errors that we are sure need to be fixed for the lambda to bind since they are errors under every binding of the lambda.  The union errors are possible errors depending on which binding of the lambda is under consideration.  So we don't report the union unless the intersection is empty since if they fix all errors in the intersection that may make the program compile correctly (and most often does).  In practice, the compiler rarely reports the union set since the intersection set is almost always not empty.  In both of the example queries above the intersection set is not empty and so it can be reported.

We are still working on getting the error messages right and would love to have your feedback on other problems that you notice.  Next time you see a good error message inside of a query you can now understand how it got there.  Or maybe not.  Maybe it will be helpful enough that you won't even think of it.  If that is the case then we have done our job right.