Reading and Writing Queries

Article
12/25/2006

Now that we have discussed some of the underpinnings of query expressions, we can turn our attention to how to use them. As has already discussed, most people who see query expressions immediately think of SQL and often even mistake query expressions for embedded SQL. While it is true that there is a similarity in the syntax, it has already been pointed out that they are fundamentally different in important ways. Because of their semantic difference and syntactic similarity they seem to cause somewhat of a cognitive dissonance with respect to SQL. In other words, people see query expressions and assume that they behave exactly like SQL. With some providers (LINQ to SQL) this may be largely true, but with others they are very different (LINQ to objects). In this post, we will examine the syntax and scoping rules of query expressions.

Query Syntax

First, let's turn our attention to the matter of syntax. A query is simply a sequence of clauses. There are only nine types of clauses: from, join, join...into, let, where, orderby, group...by, select, into. A query must begin with a from clause and end with either a select or a group by clause. In between the start and the end there may be any number of from, join, join...into, let, where, and orderby clauses. After a query has been ended it may be "continued" by using an into clause followed by any number of from, join, join...into, let, where, and orderby clauses and then followed by either a select or group...by clause. Here is a description of the syntactic structure of each clause (modified EBNF where more descriptive labels are given to the expression and identifier non-terminals).

fromClause -> from variable in source

joinClause -> join variable in source on outerKey equals innerKey

joinIntoClause -> join variable in source on outerKey equals innerKey into groupingVariable

letClause -> let variable = expression

whereClause -> where condition

orderbyClause -> orderby key [ascending | descending] {, key [ascending | descending ]}

selectClause -> select projection

groupByExpression -> group groupingResult by groupKey

intoClause -> into variable

Here is an example:

from c in customers

where c.State == "WA"

orderby c.Name

join o in orders on c.ID equals o.CustomerID

let OrderTotal = o.Price * o.Quantity

select new { c.Name, OrderTotal }

Unlike SQL, C# queries do not begin with a select clause. The primary motivation for having the from clause come first is because in C# variables are always declared before they are used. So when c is used in the where clause, it is obvious that this c was introduced before it (in the from clause). One benefit of having a strongly typed language that declares variables before they are used is that IDEs can provide intellisense for these variables when they are used. Other reasons for the differing order of clauses will be discussed in the next post.

Scoping Out Queries

Based on the syntax and translation rules of query expressions and those of transparent identifiers, it is possible to derive the scoping rules for queries. Mostly these scoping rules are somewhat intuitive, but there are a few places where you might not get what you expect. To discuss the scoping of rules effectively, it is helpful to introduce a few formalisms. We can think of each query clause as a function that takes a scope as input and produces a scope as output. So for example, if s represents the set of variables that were introduced in this query's previous clauses and are still in scope and o represents the set of variables that are in scope from outside of the query then after the from clause all the variables in s and o are in scope as well as the variable that the from clause introduced. Also, in the source expression in the from clause both the variables from o and from s are in scope but x is not in scope. We will denote this as:

[o,s] -> from x in [o,s] -> [o,s,x]

First we see in this notation that variables in o and s flow into the query. Then we see that in the source expression in the from clause, all of the variables in o and s are in scope. Finally, the notation indicates that after the from clause all of the variables in o and s as well as the variable x are in scope. The following are the scoping rules for each clause (based on the rewrite rules).

[o,s] -> from x in [o,s] -> [o,s,x]

[o,s] -> join x in [o] on [o,s] equals [o,x] -> [o,s,x]

[o,s] -> join x in [o] on [o,s] equals [o,x] into g -> [o,s,g]

[o,s] -> let x = [o,s] -> [o,s,x]

[o,s] -> where [o,s] -> [o,s]

[o,s] -> orderby [o,s] -> [o,s]

[o,s] -> group [o,s] by [o,s] -> [o,s]

[o,s] -> select [o,s] -> [o,s]

[o,s] -> into x -> [o,x]

Again, the spec does not call out these scoping rules. They are an emergent behavior given the rewrite rules and the rules for transparent identifiers.

Occasionally, I talk to someone who wonders why the two key expressions in a join clause are not commutable. One reason becomes apparent after examining the scoping rules: the expressions are not commutable because they don't have the same variables in scope. The join query variable is not in scope in the outer key expression while the previous query variables are not in scope in the inner key expression.

Using these scoping rules we can understand now which variables are in scope in each expression in a query. Try this query out. See if you can figure out which variables are in scope at each point (the answer is at the end of this post).

var src = ...;

var q = from x in [???]

        where [???]

        from y in [???]

        join z in [???] on [???] equals [???] into g

        orderby [???]

        let a = [???]

        select [???]

        into b

        join c in [???] on [???] equals [???]

        group [???] by [???];

Thinking About Query Continuations

In the previous section you may have noticed that query continuations (into variable ...) erase all of the variables that were in scope in the query. This is because query continuations take the result of a query and pipe it in to the next query. Looking at the translation rules, we see that a query like:

from x in foo

select f(x)

into y

select f(y)

Becomes:

from y in from x in foo

select f(x)

select f(y)

The key here is that we can divide a query into a series of sections where each section ends in either a select or a group by. Each section can be considered independently. For example, instead of thinking of the query as the rewrite indicates, we could think of it as:

var q1 = from x in foo

select f(x);

var q2 = from y in q1

select f(y);

Notice how each section of the query becomes one query and then is used as the source for the next query section. The difference between this "rewrite" and the specified rewrite is that the specified rewrite produces a single expression while this "rewrite" produces several statements. So while this is not the actual translation, it is helpful in understanding how query continuations work.

Answer to Scoping Question

var src = ...;

var q = from x in [src]

        where [src, x]

        from y in [src, x]

        join z in [src] on [src,x,y] equals [src,z] into g

        orderby [src,x,y,g]

        let a = [src,x,y,g]

        select [src,x,y,g,a]

        into b

        join c in [src] on [src,b] equals [src,c]

        group [src,b,c] by [src,b,c];

Reading and Writing Queries

Additional resources