Linq Specifiqs - var

So this is the start of a series of posts that will dive a little deeper into the new C# 3.0 features.  My previous posts covered an overall view of what we were doing with the language as well as the Linq initiative, but didn't delve deep into the nitty gritty behind everything.  Hopefully through these posts i can make each individual facet of the future work clearer.  I'll also bring up open questions we still have and my own personal thoughts on the where we are now and where i hope to be.

It should be noted that what we have shown at the PDC is just a *preview*.  Nothing is set in stone, and it's quite likely that things will definitively change before the final future release.  It's my hope that by communicating with the developer community we can end up creating a better C# 3.0 for everyone.

So, i'm going start with the "var" feature.  The full name is actually "Implicitly Typed Local Variables", but that's quite a mouthful, so we'll just be calling it the "var" feature for now.  So what is this feature?  Well, as the full name would imply, it's a feature that allows you to declare a local variable without having to explicitly declare its type.  For example, say you currently had the following code within some member:

 int i = 5;
string s = "Hello";
double d = 1.0;
int[] numbers = new int[] {1, 2, 3};
Dictionary<int,Order> orders = new Dictionary<int,Order>();

You could now write that as:

 var i = 5;
var s = "Hello";
var d = 1.0;
var numbers = new int[] {1, 2, 3};
var orders = new Dictionary<int,Order>();

The important thing to realize is that "var" does *not* mean "object".  In fact, both code samples above will compile to the *exact* same thing. 

So how does this work?  Well, unlike a regular local variable declaration, a "var" declaration is required to have not just a name, but also an initializer as well.  The compiler will then figure out the type of the initializer expression (which is well defined as per the rules of the C# language) and treat the declaration as if you'd used that type as the local variable type.  So if you then try to type:

 s.IndexOf('{'); //This will compile.  's' is a string, and string has an IndexOf(char) method on it.
s.FoobyBoob(); //This won't compile.  string doesn't have 'FoobyBoob' method

To make things clear.  "var" isn't some "variant" type (although the name is certainly unfortunate), and it doesn't imply some sort of "dynamic" typing system going on.  Your code is statically checked exactly as it would be if you had explicitly written down the type.

Now, right now "var" is just for local variables.  Why not allow it for something like a method parameter declaration?  Well, say you had the following:

 interface IExample {
    void ShowOff(var parameter); //what type would this be?  It has no initializer to determine the type from
}

class OtherExample {
    void Demonstration(var parameter) { //what type would this be?
        parameter.Convert();            //we can't figure out what type it should be based on what we see here.
    }
}

In the first case, there's simply nothing we could do.  Without any sort of code in scope that uses "parameter" we couldn't hope to determine what its type was.  In the second case, it's possible we could try to figure out the type somehow, but it would probably be enormously complex and confusing (And often we'd still be unable to figure it out).  By limiting to local variables that *have* to have an initializer, we ensure a feature that will be usable and available in pretty all places where local variables are allowed..

So... um... neat... but why would i want that?

That's a fantastic question.  Let's start by referring to some shortcomings/negatives first.  While implicitness can be quite handy for writing code, it can make things quite difficult when trying to just read code.  In all the above examples it's fairly easy to determine what the type of each of the variables is.  That's because you have either a nice primitive that you can look at, or the call to some constructor which you can then look directly at to figure out the type of variable.  But it's not always that simple.  What if you were to have:

 var lollerskates = GetSkates(these, parameters, will, affect, overload, resolution);

Now what do you do?  As i said before the type of the variable will be statically determined by the compiler using the expression binding rules that are spelled out in detail in the C# specification.  But that in itself is an extraordinary problem.  There is a huge number of binding rules, and some of them (like generic type inference) are quite complex.  Keeping all those rules in your head and correctly trying to apply them on-the-fly in order to just comprehend what your code means sounds like a rather onerous burden to put on you.  In effect we'd be asking you to do the compilers job just so you could answer the question "is lollerskates an IEnumerable or an IList??"

On the other hand, var does make other things nicer.  For one thing, it avoids some pretty ugly duplication that arises when you start to write heavily genericized code.  Instead of needing to write:

 Dictionary<IOptional<IList<string>>,IEnumerable<ICollection<int>>> characterClasses = 
    new Dictionary<IOptional<IList<string>>,IEnumerable<ICollection<int>>>()

you can now write:

 var characterClasses = new Dictionary<IOptional<IList<string>>,IEnumerable<ICollection<int>>>()

There's a heck of a lot of duplication that you can now cut out.  It cleans up your code *and* makes it more readable (IMO).  Two plusses that are always welcome in my book.  Another benefit is if you have code like this:

 object o = ExistingMethod();

...

object ExistingMethod() { ... }

If you then update ExistingMethod like so:

 string ExistingMethod() { ... }

Well, your code still compiles, however if you want to take advantage of the fact that ExistingMethod returns "string" (i.e. to call some method like IndexOf on them) you'll have to update all your declarations that call from "object" to "string".  If you had declared it as:

 var o = ExistingMethod();

then you would only have to update one location while still being able to take advantage of that redefinition everywhere.

Ok.  Well, those both seem somewhat *meh*'ish.  Convenient sure, but worth the potential negatives in code readability?  With C# 2.0 as it exists today... probably not.  But with some of the Linq work we have coming up, then the picture changes quite a bit.  Let's start by looking at a potential way you can write queries in C# 3.0:

    var q = 
      from c in customers
      where c.City == "Seattle"
      select new {
          c.Name,
          Orders = 
              from o in c.Orders
              where o.Cost > 1000
              select new { o.Cost, o.Date }
      };

In this case we're generating a hierarchical anonymous type (which i'll go in depth on in a future article).  Say we didn't have "var", but we *did* have some hypothetical syntax for expressing an anonymous type.  We'd end up having to write the following:

   IEnumerable<class { string Name; IEnumerable<class { int Cost; DateTime Date }> Orders }> q =
      from c in customers 
      where c.City == "Seattle" 
      select new { 
          c.Name, 
          Orders = 
              from o in c.Orders 
              where o.Cost > 1000 
              select new { o.Cost, o.Date } 
      };

Good golly!  That's a lot to type!  As with the "Dictionary" example above, you end up with a declaration with a lot of duplication in it.  Why should i have to fully declare this hierarchical type when the structure of the type i'm creating is fairly clear from the query initializing it.  You could make things easier for yourself by defining your own types ahead of time instead of using anonymous types, but that would make the act of projecting in a query far less simple than what you can do with anonymous types.   Of course, if you want to do this, you're completely able to do so while being fully able to work within Linq system.  And, because this seems like a feature with it's own plusses/minuses, and because it seems like people will want to move back and forth between implicit/explicit types depending on the code they're writing, it will make a lot of sense for us to provide user tools for this space.  Some sort of refactoring like "Reify variables".  Or... since people won't know what the heck that means: "make variable implicit/explicit."   :-)

So what do you think?  Are there things you do/don't like about "var"?  Personally, i think the name is somewhat of a problem.  C++ is introducing a similar concept, while calling it "auto".  I'm partial to that, but leaning more to making "infer" a keyword itself.  I think writing down "infer a = "5" reads very well and helps alleviate confusion issues that might arise with "var".