Transparent Identifiers

My last post described in detail how query expressions in C# are translated, but I have a confession to make.  I did a little bit of hand waving through one part of the translation rules.  The astute reader (whomever that is, whenever I read such things I always wonder, "Am I that astute reader?") may have noticed that four of the rules introduced a decidedly foreign language feature into rewrite rules.  For example, examine rule #14 closely:

14.  From followed by let

from x1 in e1 let x2 = e2 ...

from * in e1.Select(x1 => new { x1, x2 = e2 }) ...

What is that asterisk in the resultant transformation and what does it mean?  It isn't a pointer deferencing operator and it isn't a multiplication operator, hmmm...What could it be?  It looks like it is taking the place of an identifier in the from clause but that is a strange name for an identifier.

In fact, the asterisk denotes a transparent identifier and it is the name of this identifier.  Transparent identifiers are one of the strangest additions to C# 3.0.  The spec has precious little to say about them.

1.  They are introduced to a program only by query rewriting

2.  Each transparent identifier has exactly one associated anonymous type (the anonymous type introduced during the same rewrite step as the transparent identifier)

3.  When a transparent identifier is in scope, its members are in scope as well (transivitely)

Rule 3 is kind of strange because it mentions scope in an otherwise syntactic rewrite.  Scope has to do with semantics and not syntax so we have to be very careful how we apply this rule.  Now consider the following query:

from x in foo

let y = f(x)

let z = g(x, y)

select h(x, y, z)

Applying rule 14 the first time the query is reduced to:

from * in foo.Select(x => new { x, y = f(x) })

let z = g(x, y)

select h(x, y, z)

Here we introduced a transparent identifier and the anonymous type that is associated with it is new { x, y = f(x) }.  Now we apply rule 14 again.

from * in foo.Select(x => new { x, y = f(x) }).Select(* => new { *, z = g(x, y) })

select h(x, y, z)

We have now introduced another transparent identifier but this one is associated with a different anonymous type.  It is associated with new { *, z = g(x, y) }.  In order to note the difference they will labeled them *1 and *2.

from *2 in foo.Select(x => new { x, y = f(x) }).Select(*1 => new { *1, z = g(x, y) })

select h(x, y, z)

Finally, we can apply rule 15.

foo.Select(x => new { x, y = f(x) }).Select(*1 => new { *1, z = g(x, y) }).Select(*2 => h(x, y, z))

Where *1 and *2 are associated with the following anonymous types:

*1 = new { x, y = f(x) }

*2 = new { *1, z = g(x, y) }

You may notice that in the final select call that there is a lambda with only one parameter (*2) but in the body of this lambda we reference x, y, and z but none of these variables are in scope.  Using rule 3 about transparent identifiers we see that when *2 is in scope so are *1 and z but since *1 is now in scope so are x and y (the transitive closure of the members of *2).  So really when we refer to x in h(x, y, z) we are really referring to *2.*1.x.  Thus we have the following:

foo.Select(x => new { x, y = f(x) }).Select(*1 => new { *1, z = g(*1.x, *1.y) }).Select(*2 => h(*2.*1.x, *2.*1.y, *2.z))

Finally, once we realize that the compiler treats these transparent identifiers just as unspeakable compiler generated names then we see that all the magic has been removed:

foo.Select(x => new { x, y = f(x) }).Select(t0 => new { t0, z = g(t0.x, t0.y) }).Select(t1 => h(t1.t0.x, t1.t0.y, t1.z))

Voila!  No more transparent identifiers.  Transparent identifiers are used in query rewriting to package up the intermediate results and pass them onto the next clause.  They are means of essentially creating a little scope and passing it around.  It is a fantastic idea that allows variables to flow through the queries providing the kind of behavior that users expect to see.