Why no var on fields?


In my recent request for things that make you go hmmm, a reader notes that you cannot use “var” on fields. Boy, would I ever like that. I write this code all the time:

private static readonly Dictionary<TokenKind, string> niceNames =
  new Dictionary<TokenKind, string>()
  {
    {TokenKind.Integer, “int”}, …

Yuck. It would be much nicer to be able to write

private static readonly var niceNames =
  new Dictionary<TokenKind, string>()…

You’d think this would be straightforward; we could just take the code that we use to determine the type of a local variable declaration and use it on a field. Unfortunately, it is not nearly that easy. Doing so would actually require a deep re-architecture of the compiler.

Let me give you a quick oversimplification of how the C# compiler works. First we run through every source file and do a “top level only” parse. That is, we identify every namespace, class, struct, enum, interface, and delegate type declaration at all levels of nesting. We parse all field declarations, method declarations, and so on. In fact, we parse everything except method bodies; those, we skip and come back to them later.

Once we’ve done that first pass we have enough information to do a full static analysis to determine the type of everything that is not in a method body. We make sure that inheritance hierarchies are acyclic and whatnot. Only once everything is known to be in a consistent, valid state do we then attempt to parse and analyze method bodies. We can then do so with confidence because we know that the type of everything the method might access is well known.

There’s a subtlety there. The field declarations have two parts: the type declaration and the initializer. The type declaration that associates a type with the name of the field is analyzed during the initial top-level analysis so that we know the type of every field before method bodies are analyzed. But the initialization is actually treated as part of the constructor; we pretend that the initializations are lines that come before the first line of the appropriate constructor.

So immediately we have one problem; if we have “var” fields then the type of the field cannot be determined until the expression is analyzed, and that happens after we already need to know the type of the field.

But it gets worse. What if the field initializer in a “var” field refers to another (static) “var” field? What if there are long chains, or even cycles in those references? There can be arbitrary expressions in those initializers, expressions which contain lambdas which contain expressions which require method type inference or overload resolution. All of these algorithms that are in the compiler were written with the assumption that when they run, the types of every top-level program entity is already known. All of those algorithms would have to be rewritten and tested in a world where top-level type information is being determined from them rather than being consumed by them.

It gets worse still. If you have “var” fields then the initializer could be of anonymous type. Suppose the field is public. There is not yet any standard in the CLR or the CLS about what the right way to expose a field of anonymous type is. We don’t have good policies for documenting them, versioning them, or interoperating with them across languages. Doing this feature would potentially cause huge costs across the division.

Inferred locals have none of these problems; inferred locals never have cycles or refer to things that haven’t been analyzed yet. Inferred locals never escape into public visibility.

So apparently this simple-seeming feature has the potential to cause really, really bad implementation issues in multiple ways, and all in order to avoid a small redundancy. This seems like it is possibly not worth the cost. If our goal is to remove the redundancy, I would therefore prefer to remove it the other way. Make this legal:

private static readonly Dictionary<TokenKind, string> niceNames =
  new()…

That is, state the type unambiguously in the declaration and then have the “new” operator be smart about figuring out what type it is constructing based on what type it is being assigned to. This would be much the same as how the lambda operator is smart about figuring out what its body means based on what it is being assigned to.

Thoughts?

Comments (35)

  1. Frederik Siekmann says:

    I would say the ‘new’ operator is obviously a feature of the kind ‘nice to have but not really important’. In contrast to ‘var’ or the type inference for lambdas which can greatly improve the readability of a method, the ‘new’ operator would save you at most one second of parsing – I mean, a declaration of a field is generally so obvious that you don’t have to read it twice or spend several minutes to understand it. Therefore the gaigned profit is probably not worth the effort to implement/test/… it.

    But there is something in your post which confused me a little bit: “What if there are long chains, or even cycles in those references?” First I wanted to respond: Hey it’s not possible to have cycles in a definition of a field because you can’t refer to other fields inside the definition, but I just found out that is not true for static fields. An example:

    public static class Foo1

    {

    public static List<int> Bar = new List<int>()

    {

    Foo2.Bar.Count,

    };

    }

    public static class Foo2

    {

    public static List<int> Bar = new List<int>()

    {

    Foo1.Bar.Count,

    };

    }

    The code compiles and throws a null-reference as expected because either Foo1.Bar is not constructed when Foo2.Bar is accessed or vice versa. Which brings me to my question: Why is it possible to reference to other static fields inside the definition of a static field? Since no order of compilation is guaranteed at all, I can hardly think of any possible use of this.

  2. Eric Lippert says:

    Order of compilation is irrelevant. What is relevant is the order in which the static field initializers run, and that is well-defined. See section 10.12 of the specification for details. (It is arguably a bad programming practice to rely upon these details, but it is legal.)

  3. Ilya Ryzhenkov says:

    Why not limit "var" fields to some well-defined number of constructs, like constants and object creation expressions. This would probably cover 80% of cases, allow for future expansion of the feature and doesn’t look too binding for future. As for performance of compiler, parsing should already be done at this point and you can easily detect if var is valid from AST. As for resolving the type for top-level structure: for constants you know it, and for object creation expression it is the same as resolving type specification to the left of the field’s name.

  4. Frederik Siekmann says:

    I know I am nitpicking smartass (I’m really sorry about that 🙁 ) but in 10.12 the specification only makes a statement about classes with static constructors. In combination with 10.5.5.1: "If a static constructor (§10.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class." Therefore I would say it’s not really defined which of the two Foo’s throw an exception regardless of the access pattern of Foo1.Bar and Foo2.Bar.

    But I got your point: the behaviour is more or less defined and could be of use in some cases.

    Thanks for the clarification.

  5. MichaelGG says:

    The "<Type> Id = new()" doesn’t feel very good — not that that’s complete reasoning by itself. It’s a wierd "value" that could never stand alone, so it doesn’t seem proper to have on the rhs. Of course, it also only solves the particular issue when constructing a type directly. It still wouldn’t work in the cases where you want to use a method to construct the value.

    Overall, this is only a small part of C#’s verbosity (although, having 300+ character fields is still quite insane). In the examples above, why do we even need to specify the types at all? Or when defining methods, why must we manually calculate each generic type parameter and all the constraints necesary? Or for that matter, even specify the types of the parameters? Even being able to partially specify type parameters at a callsite would be a good start. (Like being able to say Foo<Bar,?> and let the compiler figure out ?.) (And in general, yes, I know, overloading is a PITA, for starters.)

    I’m not sure there is a solution that keeps C# style and backcompat, and doesn’t mean completely re-implementing things. So, as to the " = new()" idea, I don’t see nearly enough benefit for that feature alone.

  6. Eric says:

    Not a big fan of the new() idea – as MichaelGG said, it doesn’t "feel" right.

    C# started off as an extremely clean language, but since C# 3.0 it’s feels as though a large number of kludges were added solely for LINQ.

    I recently did a demonstration of C# 3.0 for our development team and most of them said "Ughhh" to the language extensions before I showed them LINQ.

    The real draw of C# was that it was straightforward and clean. The C# 3.0 extensions feel forced and as though the language is heading down the wrong path – loading on unnecessary solutions for fringe cases. Overall the language will suffer.

    Languages don’t need to evolve with every product release, it really feels like at this point the C# language team is trying to justify it’s existance and not really improving the language (no offense!). C# 2.0 was as close to a "perfect" strongly typed language as you could get, and 3.0 really destroyed that. Lets not go further down that path – I’d rather see C# stay the way it is (now a somewhat mature language) and focus put into the compiler, BCL, and CLR.

    Sorry!

  7. MichaelGG says:

    @Eric,

    The only reason C# 3.0 feels "dirty" is precisely because all these things were added only for specific cases, namely LINQ’s cases. Nothing feels like it was designed for the language as a whole. OTOH, it tries to be a C-ish syntax language, and by the time you finish cleaning it up and simplifying the syntax… not sure you’d end up with anything C-like.

    C# 3 was a major step forward, but it was only a start, and I was so hoping that C# 4 would follow through with the apparent path set out. But as Eric Lippert said before, too many users thought this was too hard, too much, too complex. With the recent announcement of C# and VB going to be "equal, just different looking", it’s clear what the future path for MS .NET languages is.

  8. Greg Beech says:

    I don’t like the idea of the "{type} {name} = new(…)" syntax much, mainly because of the inconsistency with the way that var works. It would feel very weird to be able to specify the type only on the RHS within method body as we do now, and then be able to specify it only on the LHS when it’s a field.

    And then what happens in the future if you do re-architect the compiler so that declaring fields using var would be possible, and if the CLR/CLS does come up with a specification for anonymous types to be exposed and shared between languages? Then you’re left with an inconsistent syntactic wart which arose from technical issues rather than being designed into the language as the best way to do things.

    Sure, it’s redundant type information, but (a) there’s a fair bit of that in C# anyway, and (b) it’s not really that painful, especially if you have something like ReSharper which will fill in the RHS for you anyway by simply hitting TAB. I’d say either do it the way that would be ideal (var) or don’t do it at all (until technically possible).

  9. Craig Gidney says:

    Visual basic has a special syntax to avoid repetition of the type in the most common case:

    dim x as Object = new Object()

    becomes:

    dim x as new Object()

    Of course that syntax only works in VB because it fit naturally with how things were already declared. C# goes ‘type name’, which would naively mean something like "new List<Widget>(128) widgets". I would suggest the following syntax:

    var x: new Object()

  10. I don’t think that an alternate new syntax should be introduced.  Firstly, new syntax is extra mental weight, so it better be worth it.  Secondly, I don’t think C# should encourage the use of constructors at all – constructors already have weird semantics as is.  Many constructors have various implicit initializations – i.e. call one overload and you get a class loaded from DB, another and you get an "unintialized" object, another and you get an inline-initialized object, etc.  These initializations are bad since they’re unnamed; that is; a reader of code (and by extension the intellisense-using writer) cannot easily determine which overload to use.

    Constructors are one of the few methods where people find it acceptable for the "same" method to have vastly different semantics amongst overloads.  That’s not a good habit.

    Adding such a syntax would seduce programmers into adding "handy" constructors and make a bad situation worse; even more functionality would be put into constructors.

    Constructors are already one of the most amorphous aspects of the language; they come across as a bag of various features only loosely coupled to a vague intent.  Why are only constructors the only static methods able to be required for a generic type paramenter?  Why are only constructors able to guarantee non-null return value?  Why are constructors not able to return null?  Why are only constructors unable to return a subclass of their normal return value (leading to overcomplicated factory methods)?  Why are only constructors able to require that a subclass call them?  Why is the collection and object initializer syntax only available for constructors?

    I’d much prefer the language evolve toward dissociating these many features and making them generally useful than to convolute the constructor even further.

    So, in the name of avoiding unnecessary syntax baggage, and in the name of making the language "general", I’d vote for not implementing such syntax.

  11. Michael Liu says:

    Array initializers state the type only once:

    int[] values = { 0, 1, 2 };

    Perhaps the syntax could be extended to collection initializers:

    List<int> values = { 0, 1, 2 };

    Dictionary<TokenKind, string> niceNames = { };

    And object initializers:

    Point point = { X = 0, Y = 1 };

  12. Stefan Wenig says:

    Eric,

    type name = new() does not only look very un-C#y, it also does not cover the case where type inference would actually save a lot more of typing (and reading!):

    var name = Foo(…);

    where Foo has a fixed return type or does some nice type inferencing itself. I did run into some situations where this would have helped a lot. (Also, calling constructors for anything but primitive objects is a bit old school anyway. Inferencing from arbitrary expressions would obviously not help IoC users, but it could be used by fields that get initialized by calls to factories or service repositories. a special ‘new’ syntax gives us nothing of this, but encourages users to keep using constructors directly, with all the downstream problems of testability and extensibility.)

    Do we care when the compiler tells us to be explicit whenever it runs into complex or even cyclic dependencies? Obviously, C# programmers are used to that already:

    error CS0411: The type arguments for XXXX cannot be inferred from the usage. Try specifying the type arguments explicitly.

    What’s so different about a field that needs its type to be specified explicitly?

    Referencing fields in field initializers is not good practice anyway, one could argue that it _should_ be punished. Personally, I would disallow it for public fields too, because I believe a change in an expression should not change the classes public interface. This could have too many trickle-down effects. Same for public fields of anonymous types. (What would be next? Guessing a method’s return type? I’m not saying that this would be a bad thing, other languages do that. But that wouldn’t be C# anymore. I’d rather have return type inferencing the other way – guessing a return statement’s type from the declared return type.)

    Would it be hard to change the compiler? Probably. But then again, you never accepted the argument that a proposed language feature would be relatively to implement easy either! 😉

    PS: This blog engine ate my post – two times now. (and only the second time was I smart enough to keep a copy.) This is ridiculous. I’ve had that before. I really believe you should take this to the admins. (We later engaged in a short email conversation, in case you want to give them the text to look for a reason, like some special character squences. However, this time around the posting got confirmed, while last time I was just taken to the home page.) Few things are more frustrating than typing lots of stuff, seeing it confirmed (that’s when I stopped caring about the copy I had in the clipboard), but never see it appear.

  13. Stefan Wenig says:

    PPS: I posted using Firefox 3 this time (previous attempts using IE7 did not work). Don’t know if that made a difference but the posting appeared. Well, maybe that’s just Microsoft trying to make the EU happy 😉

  14. I agree with Ilya, except that I would expand the proposal futher: you can use var fields, but your initialization expression cannot access other var fields. This would probably cover 95% of the useful cases.

    But I do appreciate that a change like this messes up the architecture of the compiler.

    Igor Ostrovsky

  15. LeopoldBushkin says:

    First off, I also had trouble posting with IE7 – this post was created with Firefox.

    As for the idea to introduce a new syntax for new, I think that it is a bad idea for several reasons. First, unlike var, it could only be used to initialize members that are concrete types – since there would be no way to infer the type to construct when the left hand side is a interface or abstract class. The compiler can certainly warn you .. but it’s an awkward inconsistency that doesn’t buy you much.

    Second, and more important, this syntax may actually allow the semantics of a program to change subtly without the developer being aware. Take the following example:

    class Animal { override string ToString() { return "Animal"; } }

    class Dog { override string ToString() { return "Dog"; } }

    class Vet { public readonly Dog ThePatient = new(); }

    No some brilliant developer comes along, and without thinking too deeply about it says: hey, we should expose ThePatient as a reference to the base type Animal. Well, as a result, the compiler infers that the type to create should now be an instance of Animal, rather than Dog. The developer may not have intended this … they just didn’t realize that this inference is taking place. (Yes, developers should pay attention to what they’re doing and understand the language, but it’s an easy thing to overlook). The compiler won’t complain … it will happily change the runtime type instantiated – potentially leading to subtle and difficult to track down bugs.

    The var keyword doesn’t have this issue because the compiler isn’t deciding what type to instantiate – just what type of reference to assign to. In other words, var never results in a different method than you expect getting invoked.

    I think that allowing the compiler to make inferences about what runtime types to instantiate is a bad idea – this is a case where C# should favor correctness rather than convenience. IMHO.

  16. I guess I’ll go out on a limb here and say that I like the idea of having a clean syntax that doesn’t force me to repeat things.

    List<KeyValuePair<string, LinkedList<TreeNode>>> list = new(…)

    is pretty clean i think.  I don’t get any bad feelings at all from it and the syntax seems entirely reasonable.  This also comes down to an issue of maintenance.  As code is being prototyped the internal structures that I am using get shuffled around and changed a lot and a syntax like this keeps me from having to constantly revisit places where the full type name would normally be required.  I must say that I would prefer this syntax over the var syntax currently being used.  I like the type to be textually tied to the identifier it is associated with and having the type declaration on the LHS makes this more clear in my mind.  It would also be nice to support something like this:

    List<KeyValuePair<string, LinkedList<TreeNode>>> list;

    list = new( … );

    So that the declaration and the new don’t need to appear as part of the same statement.

    All in all it seems like a nice mechanism to default to the declared type when creating an object which is probably only going to become a bigger issue going forward with generics becoming pervasive.  I tend not to worry about the corner cases tho but it seems like it would be useful in a number of not so corner cases.

  17. Joku says:

    If this was to be implemented in the very conservative manner, I like Rising’s syntax best:

    Dictionary<TokenKind, string> niceNames = { };

  18. Luke Breuer says:

    Isn’t Anders considering a fairly deep re-architecture of the C# compiler in modularizing it?  See the PDC 2008 talk on "The Future of C#" [1]; I took a few notes at [2].  I should think a major objective of you guys at MS should be to reduce syntactic cruft in situation where not much more complexity is added.  See the Sapir-Whorf hypothesis [3].  Less cruft == ability to focus on what matters.  Combine that with the limited amount of information the brain can actually process in short-term memory and reduction of syntactic cruft, IMHO, becomes extremely important.  Yes, -100 points — give us some transparency into some specifics and we can discuss. :-p

    [1] http://channel9.msdn.com/pdc2008/TL16/

    [2] http://luke.breuer.com/time/item/C_40/465.aspx

    [3] http://en.wikipedia.org/wiki/Sapir-Whorf_hypothesis

  19. Frederik Siekmann says:

    Rearchitecturing the  ompiler is probably -10000 and not -100 😉

  20. Alex Stockton says:

    If you want to create an object of the same type as the member used to store it without having to repeat the type name, you can use "Stockton new" as an alternative to var (which may be useful for fields where var cannot be used) as discussed. The downside is that you have to repeat the member name in the initialization. Here’s how it looks:

    class Program

    {

           public static T New<T>(out T item) where T : new()

           {

               item = new T();

               return item;

           }

           static Dictionary<Int32, Int32> _member = New(out _member);

           static void Main(string[] args)

           {

               Dictionary<Int32, Int32> local = New(out local);

           }

    }

    In addition, we can extend this method to create concrete classes for corresponding interfaces with a couple of simple overloads:

    public static IDictionary<TKey, TValue> New<TKey, TValue>(out IDictionary<TKey, TValue> item)

    {

         item = new Dictionary<TKey, TValue>();

         return item;

    }

    public static IList<T> New<T>(out IList<T> item)

    {

         item = new List<T>();

         return item;

    }

    Now you can write this:

    IDictionary<Int32, Int32> local = New(out local);

    Not perfect but no compiler changes required.

    BTW I also had problems posting with IE8. This post was done with Safari.

  21. Avi Farah says:

    Eric,

    I would much rather have consistent syntax var x for fields as it is for variables.  If that means that I have to wait until the CLR / compiler matures further then so be it.  But I, personal feeling, would rather have consistency if the feature is introduced.

    Thanks for reading

    –Avi

  22. Avi Farah says:

    Eric,

    I would much rather have consistent syntax var x for fields as it is for variables.  If that means that I have to wait until the CLR / compiler matures further then so be it.  But I, personal feeling, would rather have consistency if the feature is introduced.

    Thanks for reading

    –Avi

  23. darren oakey says:

    had this big comment written, but it seems to have gotten lost, so I’ll go with the short version –

    var fields => bad, because they sacrifice too much in terms of documentation.  Local var is great, because it’s obvious from the scope what the type is, but it will be as hard for a human to work out what the type of a field is meant to be as you are saying it will be for the compiler  (yes, miracle of miracles, I’m not supporting a new feature)

    MyType x = new ( 1,2, "happy"); => good, or at least better than what we have.  I agree with the poster above, constructors are a bad thing and there are better ways, however given that people use them, this simplifies the code a bit, and I don’t see it as any different syntax from dropping the type in array initialization, which has already been done.

    more and more I _am_ ditching constructors (well, privatising them and only exposing a static factory method) – as the posts above mention, there are a whole bunch of benefits and no real downsides, however it doesn’t help very much, because you still have to duplicate the type name.. eg

    MyType x = MyType.New( 1, 2, "happy");

    however, this yet again can be solved if you allow type inference to work backwards – for instance if I typed:

    MyType x = Create();  

    where Create is defined as

         T Create<T>()  where T : new() {…}

    Then it happily proceeds and assumes T is MyType.  If we get that, then it’s the perfect solution, because it solves way more than just this problem, and allows us to define anything we want.

    Thanks,

    Darren

  24. MichaelGG says:

    "var fields => bad, because they sacrifice too much in terms of documentation"

    Um, so write a tool to print the type out if you really need it for documentation? Anyways, the IDE should show the type on hover, if it’s that important.

    Cluttering the source on purpose in name of "documentation" is a rather weak argument. For the few cases you might "need" it, you can simply write it out. The rest of the time, it’s just more junk getting in the way of understanding the code.

  25. ravenex says:

    Gee, that new() syntax proposal looks so much alike this one of Java’s:

    http://bugs.sun.com/view_bug.do?bug_id=4879776

    I’m glad C# 3 took the ‘var’ way instead of the way it’s proposed in the link above.Going from left to right fits the way people read and write, but somehow looks weird…

  26. Dave Reed says:

    I’d write this:

    var x = new();

    Just to see how creative you got with the error message 🙂 Something like, "You wanna do WHAT here?"

  27. courage_dog says:

    F# people somehow managed to handle that. Please go and ask them how to infer type for ‘var’, how to resolve long chains and cycles. No magic here IMO.

  28. tomtrias says:

    How about:

    private static readonly new Dictionary<TokenKind, string> niceNames() =

     …

    Oh, wait… that smacks of VB.Net.

  29. tomtrias says:

    Rising,

    What about the calling of non-default contructors?

  30. Roman says:

    Shocked by the negative response to the "new()" sugar. It makes perfect sense. It is comparatively cheap that the C# team may actually get to implementing it, despite the comparatively small benefit. It is possibly the only realistic solution.

    It doesn’t help to say that "language X can do this". What does _that_ have to do with how much effort this would be to implement for _C#_?

    Still hoping that something like "new()" could be added for exactly the scenario mentioned in this blog post – which I somehow keep running into all the time myself.

  31. TheCPUWizard says:

    I ALMOST like the new() sugar. My concern is the potential for confusiion with existing meanings of new.

    Using a different keyword would address this cleanly. Ony thing would consider important is that the syntax for the RHS should be swappable between a field initializer and other contexts.

    I frequently will cut/paste [e.g. from/to a constructor body] between locations. If the syntax is not compatible, it will typically NOT be a viable shortcut (for my typical development style).

  32. daxfohl says:

    I don’t see how it’s that difficult to enable.  During the class-level parsing routine, simply leave all the var fields as type "var".  After that insert a step that recursively figures out what the "var" members are, and throw an error if anything is anonymous, a delegate, or circular.  Once that’s done you can proceed to compile the methods and everything else just as before.  Is that really a huge refactor?

  33. Jon Hanna says:

    There's no reason why the editor shouldn't provide this. Type ? foo = new Bar() and have it turned into Bar foo = new Bar() as you type, without any need to change the language. SharpDevelop used to have this, but removed it when var removed some (but clearly not all) of its value.

  34. Royi Namir says:

    Eric I love your posts…..:) but really…. you should change the purple color…..:)