Closing over the loop variable, part two


(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The “for” loop will not be changed. We return you now to our original article.


Thanks to everyone who left thoughtful and insightful comments on last week’s post.

More countries really ought to implement Instant Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form “I’d prefer to make the change, but if you can’t do that then make it a warning”. Or “don’t make the change, do make it a warning”, and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

Commenters who expressed support for a warning: 26
Commenters who expressed the sentiment “it’s better to not make the change”: 24
Commenters who expressed the sentiment “it’s better to make the change”: 25

Wow. I guess we’ll flip a coin. :-)    (*)

Four people suggested to actually make it an error to do this. That’s a pretty big breaking change, particularly since we would be breaking not just “already broken” code, but plenty of code that works perfectly well today — see below. That’s not likely to happen.

People also left a number of interesting suggestions. I thought I’d discuss some of those a little bit.

First off, I want to emphasize that what we’re attempting to address here is the problem that the language encourages people to write code that has different semantics than they think it has. The problem is NOT that the language has no way to express the desired semantics; clearly it does. Just introduce a new variable explicitly into the loop.

A number of suggestions were for ways that the language could more elegantly express that notion. Some of the suggestions:

foreach(var x in c) inner
foreachnew(var x in c)
foreach(new var x in c)
foreach(var x from c)
foreach(var x inside c)

Though we could do any of those, none of them by themselves solve the problem at hand. Today, you have to know to use a particular pattern with foreach to get the semantics you want: declare a variable inside the loop. With one of these changes, you still have to know to use a particular keyword to get the semantics you want, and it is still easy to accidentally do the wrong thing.

Furthermore, a change so small and so targetted at such a narrow scenario probably does not provide enough benefit to justify the large cost of creating a new syntax, particularly one which is still easily confused with an existing syntax.

C++ luminary Herb Sutter happened to be in town and was kind enough to stop by my office to describe to me how they are solving a related problem in C++. Apparently the next version of the C++ standard will include lambdas, and they’re doing this:

[q, &r] (int x) -> int { return M(x, q, r); }

This means that the lambda captures outer variable q by value, captures r by reference, takes an int and returns an int. Whether the lambda captures values or references is controllable! An interesting approach but one that doesn’t immediately solve our problem here; we cannot make lambdas capture by value by default without a huge breaking change. Capturing by value would have to require new syntax, and then we’re in the same boat again: the user has to know to use the new syntax when in a foreach loop.

A number of people also asked what the down sides of adding a warning are. The down side is that a warning which warns about correct behaviour is a very bad warning; it makes people change working code, and frequently they break working code in order to eliminate a warning that shouldn’t have been present in the first place. Consider:

foreach(var insect in insects)
{
  var query = frogs.Where(frog=>frog.Eats(insect));
  Console.WriteLine(“{0} is eaten by {1} frogs.”, insect, query.Count());
}

This makes a lambda closed over insect; the lambda never escapes the loop, so there’s no problem here. But the compiler doesn’t know that. The compiler sees that the lambda is being passed to a method called Where, and Where is allowed to do anything with that delegate, including storing it away to be called later. Which is exactly what Where does! Where stores away the lambda into a monad that represents the execution of the query. The fact that the query object doesn’t survive the loop is what keeps this safe. But how is the compiler supposed to suss out that tortuous chain of reasoning? We’d have to give a warning for this case, even though it is perfectly safe.

It gets worse. A lot of people are required by their organizations to compile with “warnings are errors” turned on. Therefore, any time we introduce a new warning for a pattern that is often actually safe and frequently used, we are effectively causing an enormous breaking change. A vaccine which kills more healthy people than the disease would have is probably not a good bet. (**)

This is not to say that a warning is a bad idea, but that it is not the obvious slam dunk good idea that it initially appears to be.

A number of people suggested that the problem was in the training of the developers, not in the design of the language. I disagree. Obviously modern languages are complex tools that require training to use, but we are working hard to make a language where people’s natural intuitions about how things work lead them to write correct code. I have myself made this error a number of times, usually in the form of writing code like the code above, and then refactoring it in such a manner that suddenly some part of it escapes the loop and the bug is introduced. It is very easy to make this mistake, even for experienced developers who thoroughly understand closure semantics. That’s a flaw in the design of the language.

And finally, a number of people made suggestions of the form “make it a warning in C# 4, and an error in C# 5”, or some such thing. FYI, C# 4 is DONE. We are only making a few last-minute “user is electrocuted”-grade bug fixes, mostly based on your excellent feedback from the betas. (If you have bug reports from the beta, please keep sending them, but odds are good they won’t get fixed for the initial release.) We are certainly not capable of introducing any sort of major design change or new feature at this point. And we try to not introduce semantic changes or new features in service packs. We’re going to have to live with this problem for at least another cycle, unfortunately.

********

(*) Mr. Smiley Face indicates that Eric is indulging in humourous japery.

(**) I wish to emphasize that I am 100% in favour of vaccinations for deadly infectious diseases, even vaccines that are potentially dangerous. The number of people made ill or killed by the smallpox vaccine was tiny compared to the number of people who did not contract this deadly, contagious (and now effectively extinct) disease as a result of mass vaccination. I am a strong supporter of vaccine research. I’m just making an analogy here people.

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)

 

Comments (51)

  1. “The fact that the query object doesn’t survive the loop is what keeps this safe. But how is the compiler supposed to suss out that tortuous chain of reasoning?”

    Damn… I’ve been thinking about suggesting that very enhancement to the Resharper team!

    Using a mixture of Linq and foreach means that I constantly get that warning in situations where it is pure noise. Sometimes I needlessly copy the value to another variable. Sometimes I put in the special comments to make Resharper shut the hell up. But sometimes Resharper is usefully telling me that I messed up, so I can’t disable it entirely.

    And sometimes you can rewrite the loop as a query. For example, in my trivial frogs example, that could have been something like “var q = from insect in insects let count = (from frog in frogs where frog.Eats(insect)).Count() select new {insect, count}”, and then you iterate over q to get the results. — Eric

    Is it theoretically possible for the compiler (or a tool like Resharper) to figure this out? Obviously they would have to dig into the implementation of Where to find out if it stores the predicate-delegate for later use – but they’ve got the IL to examine, so I (naively) imagine it would be possible.

    It depends on what exactly you mean by “figure this out”. If all you want to know is if the closure survives the loop, then clearly it is possible for a computer program to figure that out because the garbage collector figures it out at runtime, and the GC is a computer program. Imagine you did an aggressive garbage collection at the end of each loop; if the closure gets collected then it must have been dead. If not, then it must be alive. The GC just does graph analysis to see if a particular node in the graph is reachable or not; the same analysis could be done at compile time. Now, whether it is feasible for a compile-time analysis tool to do that level of flow analysis is another question; though this is in theory possible, it is a lot of work.

    However, just because the closure survives garbage collection does not guarantee that it is ever USED in a dangerous manner. Maybe it survives by some accident, and is never called again. If figuring out that is what you mean, then no, that’s not solvable in general by computers; that’s equivalent to The Halting Problem. — Eric

    The other thing I’ve done is the usual thing of writing an extension method called ForEach that takes an Action to run for each item. That eliminates this problem. But it also doesn’t play well with yield return, and also you (Eric) advised against it because it mixes side-effect code with functional style in a misleading way. But it is the closest thing I think there is to an “easy” solution to this problem.

  2. Anthony P says:

    As a junior-ish ASP.NET developer working individually within a small company, I’m woefully behind on things that frankly do not come up so often in my world; either because they’re not necessary, or because I find another way to do whatever I need to do since I don’t know any better.

    I found this blog a few weeks ago, and find it equal parts fascinating, educational (I had no idea that in foreach(var x in y), x wasn’t a different variable for each iteration), and frustrating. The last part is because honestly quite a few of the topics and coding samples casually floated during the discussion often fly two or three feet over my head. I think this general topic is one of those examples. I would appreciate a recommendation or two for resources to help me wrap my brain around lambdas, Func(TResult) delegates (and probably delegates in general, as I’ve not often resorted to using them), and the scoping considerations involved.

               var ints = new List<int>() { 1, 2, 3 };
               var funcs = new List<Func<int>>();
               foreach (int i in ints)
                   funcs.Add(() => i);
               foreach (var func in funcs)
                   Console.WriteLine(func());

    func() always evaluating to 3 is bringing me to my knees.

    Welcome aboard.

    Two books I recommend are “C# In Depth” and “Essential C#”, by Jon Skeet and Mark Michaelis, respectively. I was technical editor of both; they’re both great. They are well-titled; Jon’s book goes a lot deeper than most mainstream books about programming languages, and Mark’s book does a good job of covering the essential material you need to know to understand the language. — Eric

  3. SuggestionBoxBob says:

    not really related to this post, but something I’d like to see would be a “what compiler warnings do you get for this *and why?*” WRT:

       class Program
       {
          static void Main(string[] args)
           {
               var x = 5; // not used
               var y = (int)5; // not used
               var z = (int?)5;  // no warning ?!
           }
       }

     

  4. SuggestionBoxBob: I’d expect all you’d get would be three “variable is assigned but never used”s. Why, what other warnings might you get? Everything else about it is completely fine, with the possible exception of the redundant cast of y to the type that it already is – but I’m fairly sure that an unnecessary cast isn’t a warning in C#.

    Your expectations would be wrong. You only get two warnings, on x and y. (I’ll add some comments to the code.) The reason why we suppress the warning on z is kinda interesting; I blogged about this a couple years ago.

    http://blogs.msdn.com/ericlippert/archive/2007/04/23/write-only-variables-considered-harmful-or-beneficial.aspx 

    We suppress a “not used” warning on a local if a non-constant value is written to it. The compiler interprets (int?)5 as “new Nullable<int>(5)”, which it sees as a non-constant computation which you might want to examine in the debugger. 

     — Eric

  5. Pavel Minaev [MSFT] says:

    > Though we could do any of those, none of them by themselves solve the problem at hand. Today, you have to know to use a particular pattern with foreach to get the semantics you want: declare a variable inside the loop. With one of these changes, you still have to know to use a particular keyword to get the semantics you want, and it is still easy to accidentally do the wrong thing.

    Eric, what about two suggestions combined (I believe I mentioned it in one of the comments)?

    1) Add a new foreach syntax for new semantics – doesn’t really matter which one.

    2) Give a warning (or maybe even error) for old semantics.

    That way, one cannot accidentally do the wrong thing, because the compiler will complain. And yet it is also not a _silent_ breaking change – at least if someone’s code is actually intentionally relying on existing behavior (which I find extremely unlikely), they’ll get a clear error message telling them what’s wrong, and, ideally, how to make it right.

  6. Pavel Minaev [MSFT] says:

    > Is it theoretically possible for the compiler (or a tool like Resharper) to figure this out? Obviously they would have to dig into the implementation of Where to find out if it stores the predicate-delegate for later use – but they’ve got the IL to examine, so I (naively) imagine it would be possible.

    Thing is, Where() actually does store the predicate for later use, inside the (deferred) enumerable object that it returns – remember that recurring statement in MSDN member documentation? "This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic."

    It can, of course, further figure out that object returned by Where() does not escape the loop body, and therefore neither does the lambda… but hopefully it’s clear now why this is actually that much more tricky.

  7. AC says:

    Great follow-up.

    I also like your stance on the issue that it’s not just a training issue. There’s always so much to do when properly writing code. It’s often difficult to always be able to look at other people’s code and pick up a nuanced edge case such as this.

    Since the race for ‘what to do’ was so close, I’d vote for the breaking change.

    I must admit I always perceive the current iteration value as being a new variable, and yes for the record to others, I do understand delegates and variable capture.

    Can you (or someone) elucidate me as to where you’d want the results to be 3,3,3? I’m no idiot and have been programming C# for a long time, but I also would have failed the interview test here. OK that’s not totally correct – I can figure out the how, but it’s the why that eludes me.

    Thanks

    AC

  8. Pavel Minaev [MSFT] says:

    By the way, how would you handle a case with 3 sides tied, by flipping a coin, in a fair way (such that every side gets equal chance)? It looks to me like this isn’t an option, either.

    (and yes, I’ve seen the *)

    We’ll flip two coins. Your side can have two heads, the other side can have two tails, and I’ll take one head, one tail. — Eric

  9. Martin Mueller (MillKa) says:

    OK, to allow fixing the issue in C#5, I change my vote and support Pavel:

    1) Add new syntax "foreach (new var x in c)" for new semantics (inside expansion).  The keyword new gives the best hint about the difference between the two fkavours.

    2) Give a warning for old syntax and semantics (outside expansion). This avoids the breaking change. Wrong code can be found. Correct code will have to use "#pragma warning" to get rid of the warning.

  10. Jesse says:

    OT: Geeks should really stop promoting Instant Runoff Voting when there are other voting systems with the same look & feel that are much more mathematically sound. The Schultze method, for instance, which is used by such groups as Wikimedia, Gentoo, Debian, Ubuntu, FSF in Europe and Latin America, BoardGameGeek, and the Pirate Party. IRV is easy to implement and its flaws aren’t obvious, but they are serious.

  11. M.E. says:

    It sounds like the ship has sailed on this one. Hopefully the lesson is learned and the next language in the Java/C++/C# tradition will not have this problem.

    Personally, I am far more annoyed by the fact that I can write

    var list = new List<int>();

    foreach(int i in list) { … }

    then change list to be, say, a List<double>() and not get a compiler error, but have it crash at runtime. I’ve had this bite me on numerous occasions.

  12. Craig Stuntz says:

    "…a warning which warns about correct behaviour is…"

    An FxCop rule. :)

    Seriously, this is the sort of case FxCop handles really well.

    So one option would be to catch it with FxCop instead of the compiler.

    Advantage is that you can do it today.

    Disadvantage is that most people who would be helped by this probably don’t run FxCop.

  13. Pavel Minaev [MSFT] says:

    @Craig:

    FXCop can’t catch that, because it analyzes compiled code, not source code. It is possible to detect such a pattern in compiled code, but given what lambdas extract to in IL, I would be surprised if FXCop is smart enough in its present state to handle that kind of thing. StyleCop could catch it, though.

  14. Pavel Minaev [MSFT] says:

    > We’ll flip two coins. Your side can have two heads, the other side can have two tails, and I’ll take one head, one tail.

    The probabilities would then be 1/4, 1/4 and 1/2 – hardly fair.

    OK, then we’ll Rock-Paper-Scissors for it. First you and the other side have a round, and then I’ll play the winner. — Eric

  15. André Ferreira says:

    I know Eric is joking, but the real solution is pretty easy.

    Flip 2 coins:

    Head Head side 1 wins

    Head Tail side 2 wins

    Tail Tail side 3 wins

    Tail Head goto Flip 2 coins;

  16. Rubix says:

    A new foreach syntax  for this very particular case would be overkill, as would be a warning.

    The interaction of foreach and closures probably should have been defined differently, but now it is too late to bother fixing it. After all, no language can evolve as much as C# did since 1.0 without having some awkward legacy issues. Just keep a note for the future designers of whatever language will come after C#.

    And it’s not only the languages that have this kind of problems, e.g. still occasionally having to use pre-generics System collections brings me on the verge of tears.

  17. orangy says:

    (Started to think about it, wild guesses)

    If I were to change language for two enumeration styles (I’m not yet sure if it should be done), I’d go with the syntax:

    for(var i in list) // i is same variable for all loop iterations

    foreach(var i in list) // i is new variable for each iteration

    If foreach body is not "pure" enough (subject to define), warning could be issued. This warning could easily be fixed by converting it to "for" loop, if needed.  Seems like there are some opportunities for "foreach" parallelization, also.  

  18. configurator says:

    Andre: I have a simpler solution. A "three sided coin". Since that is geometrically improbable, I’ll do you one better – a cube, with two sides for each possible outcome.

    Why do programmers always try to reinvent the wheel?

  19. orangy says:

    Thinking about purity of foreach block, that could be defined as "block is pure, if extracting it as separate method of the void ExtractedBlock(iterationVariable) form is pure". Then we get that problem of checking if call to Where, Select and any other method getting delegate is pure. It can be defined with some form of PureAttribute, when it means "I’m pure if this and that parameter’s body is pure". Delegate body is again subject to the transformation above, like if it can be replaced with extracted pure method, than it is pure. If by chance it is using method group, than method it resolves to should be [Pure] already.

    Again, those are just some ideas. Discussion is welcome!

  20. Nat says:

    I love that in todays world you have to qualify that vaccination statement.

  21. Mark Knell says:

    @Eric "we’ll Rock-Paper-Scissors for it. First you and the other side have a round, and then I’ll play the winner."

    Japery, indeed.  Guess it was a good vacation.

    I say we settle the choice by havting you think of a double between 0+ and 0-, and we’ll try to approximate it.

  22. Denis says:

    "A number of people suggested that the problem was in the training of the developers, not in the design of the language."

    …And a number of people say that that the music CDs are "dead", "electronic", "machine-like", whereas the vinyl records are "warm" and "live". And some time before, a number of people said the same thing about those very vinyl records, lamenting the muderous effects of technology on the spirit, or art, or whatever.

    Will we ever stop envying and resenting the next generation for having it easier than we had???

    God forgive me, but I think there is a simple psychological issue behind the "hard-line" statements like that: "if I had to have it the hard way, then so must you, little punk, and don’t you dare to have more fun than I did when I was your age!" — something like that.

    Of course the next generation of programming languages should make it easier to implement the ideas that pop up in your head — fast, not after so much nose-to-the-grindstone time that you forget what you had in mind in the beginning.

    But then again: it’s tempting for the people who have spent a lot of time mastering the "deep dark secrets" of a certain technology or language to protect the status of High Initiates, the Keepers of the Misteries of Programming, they may have thus obtained. I myself am a little bit guilty of that… :-)

  23. Eric,

    Correct me if I’m wrong, but if I understand you correctly, you give some very good arguments against making it a warning, and you basically reject the make-it-an-error option. You also repeat some argumentss against leaving the language as it is.

    Does that mean that you’re on the let’s-make-the-change side?

    After reading this post, I certainly repeat my "make the change" vote.

  24. Meta-Knight says:

    I said in the last post that you should at least make it a warning. I changed my mind. On many occasions I use the results of a LINQ query with the iteration variable right away, just like in the example you provided, and in those cases the warning message makes no sense.

    There’s only one option left: make the change you suggested.

  25. sweko says:

    Make using lambdas in foreach loops illegal.

    That would solve the problems for sure.

    No closures – no problem.

  26. "…a warning which warns about correct behaviour is a very bad warning;"

    Static code analysis tools, like Lint for C++, are hugely popular for the very reason that they help point out *legal* code that may not be what you intended. I’m jolly grateful that my C++ compiler always points out when I do this:-

       if (x = y)

    Very rarely I may want to do just this, but I’ll write the slightly longer form to placate the compiler:-

       if ((file = fopen(…) != NULL)

    C# may not have nearly as many dark corners as C++, but it has some. The Visual C++ team added a separate switch "/analyze" as less intrusive method which may be a preferable approach. Those who know what they’re doing and dislike being nannied can do so. The rest of us who know that we are fallible have the opportunity to get all the help we can lay our hands on.

  27. Dan says:

    Isn’t this known as lexical closures? Switching from this to closures that behave like they do in scheme or ruby seems more like adding a new language feature than fixing a bug.

    First off, scheme closures and ruby closures both use lexical closures. You seem to be implying that they use dynamic scoping.

    Second, I don’t see what the difference between lexical and dynamic scoping has to do with it. The question of whether a closure ought to close over a variable or a value seems germane, but I don’t see why whether the binding is lexical or dynamic is relevant. Can you clarify why you think this is relevant?

    Third, the proposal at hand is NOT to change how closures work; closures in C# are lexical closures that close over variables and that’s not changing. The proposals at hand are (1) to move the declaration of the foreach loop variable to be logically inside the loop, or (2) to produce a warning that the closure closes over a single variable, not one variable per loop iteration.

    I’m therefore very confused by your comment. Can you elaborate on what you’re getting at here, because I’m not following you. — Eric

  28. Pavel Minaev [MSFT] says:

    > Scheme closures use lexical scoping; ruby closures use dynamic scoping.

    Eric, can you elaborate on that?

    I’m not a Ruby expert, but from what I’ve seen all closures in Ruby are lexical. They do have three different kinds:

    – blocks, which aren’t first-class (you can pass a block as an argument, but you cannot store it in a variable);

    – lambdas, which are first-class, and are really most like C# lambdas in pretty much everything;

    – procs, which are like lambdas, but also “capture” flow control (so a proc can do an early return from the function in which it was created, or break from a loop if it was created in the loop body).

    All three kinds of closures are lexical, however, in that when you reference a variable “a” inside a closure, it will always be the variable visible in the scope in which the closure was created – not the nearest declared in the scope from which it is called.

    Or did you mean something else by “dynamic scoping”?

    I was misremembering. I was probably thinking of some other language that has dynamic scoping for closures, like Perl (optionally) or PostScript (by default). However, the point remains that I don’t understand the comment. I’ll fix up the text. Thanks. — Eric

  29. I’m for #1, ‘to move the declaration of the foreach loop variable to be logically inside the loop’.

    While it may be a breaking change, I find it unlikely anyone would **deliberately** depend on that behavior.

    Besides the whole idea of the foreach variable being readonly and yet changing on each iteration seems a bit like cheating to me.  :)

  30. AC says:

    Still no answers to my question of "What is a real-world scenario where you’d want the current behavior?" It’s all well and good to demonstrate the capture problem, but does someone have a real example?

    IMHO it’s only a breaking change if there exists code out there that uses it in this way and meant to.

    I have a feeling there’s a lot of code using the workaround to get the value they really want

    foreach( i in list ) {

      // copy i to v, foreach capture varialble issue see http://blogs.msdn.com/ericlippert/

      // will be fixed in c#5 we hope

      var v = i;

      funcs.Add( () => v );

    }

    In this case, fixing the #1 foreach fix (declaration inside loop) makes no semantic difference to the resulting code.

    Otherwise, the code is wrong and nobody tested it.

    I wouldn’t put it as a warning either (for the warnings-as-errors crowd), but variable capture of the foreach variable could be an ‘information’ message in C# 4.

    Also, you can still explicitly allow capturing by just declaring a variable y which is outside the foreach, which would make it clear what the value of the captured variable will be. So if there is real-world code that needs this bizarre behavior, it can still work with only minor modification.

  31. DRBlaise says:

    @Michael Greger and @AC – I can’t think of a good scenario where someone would deliberately rely on the current foreach closure behavior, but as a few people have pointed out: a real problem that would come into play if the breaking change was made in a future C# version is "multi targeting bugs."

    As Pop.Catalin put it:  "code written with the new foreach rules will not work correctly when re targeted to older compilers (C# 2.0, C# 3.0, C#4.0)"  The pains and problems would continue to live on indefinitely into the future, and now the problem would be that developers who understand and rely on the new foreach closure behavior and are required to have their libraries available in older versions would get bitten.

  32. I was thinking about the C++ approach you mentioned:

    [q, &r] (int x) -> int { return M(x, q, r); }

    and what would be a C#-esque way to express that concept. Obviously as you point out it doesn’t solve the problem for foreach at all, but it still seems like an interesting and useful feature if the syntax can be made less opaque. (Of course, that’s not a problem for C++ because "the syntax shouldn’t be opaque" seems to be an explict design-NON-goal… that fragment is positively transparent by C++ standards! 😉 )

    Here’s what I came up with that would be Csharpey…

    x => {

     var q;

     ref r;

     return M(x, q, r);

    };

    The "ref r" might be redundant since capture-by-reference is the default behavior, but maybe not, because you could require that if there are *any* explicit declarations of what is being captured, then *all* variables being captured must be explicitly so declared, to avoid accidental introduction of something that captures. Since neither "var q;" nor "ref r;" are legal statements in C# today*, I don’t think that’s a breaking change. It’d be nice to have a way to specifically declare that nothing should be captured, too, but I’m not sure what that’d be, there’s not an obvious extension of this syntax for it.

    * with the caveat that all the necessary hoops for the case where a real type called "var" is in scope have already been jumped through.

  33. DRBlaise says:

    @Stuart – I think the C++ syntax is fine for C#, just replace the & with ref:

    [q, ref r] x => M(x, q, r)

    It would also be nice to have a compiler setting that would force all captured variables to be explicitly declared in this manner.  This could help beginner developers and also avoid the foreach problem.

  34. I’ve run into the same issues Daniel Earwicker did. We use and like ReSharper, but finally changed "Access to modified closure" to a hint rather than a warning. The majority of the time the query object doesn’t survive the loop. And if it does, hopefully the developer wrote a unit test. :)

  35. AC says:

    @DrBlaise, I agree with what you’re saying in theory, but in practice, why can’t this simple change be back propagated to the old 2.0 compilers as well?

    No, that makes things even worse. Then you end up with two nigh-indistinguishable compilers that produce different results for the same program text. — Eric

    Isn’t it better to fix an issue and take some pain over propagating it forever?

    Sometimes it is, sometimes it isn’t. If it were an easy choice with a bright line then we wouldn’t be having this conversation. — Eric

    I take back my ‘no warnings’ stance and think it should at least be a warning. For the people who never read warnings (most of the people I work with) it’s no big deal. They write horrendous and buggy code regardless. For the errors-as-warnings crowd, they won’t mind a little explicitness or the odd #pragma ignore.

    I assure you from long and painful experience OH YES THEY WILL. Case in point: A couple years ago I noticed that there are some obscure cases where we do not report a “variable was never used”, or “variable was written but never read” warning when we possibly could. So I added it to an internal-only build of the compiler. OH THE PAIN. Suddenly thousands of people, all of whom had “warnings as errors” turned on by default, were getting their builds broken because the compiler was now correctly pointing out that a particular variable in their program was never used.

    But, they all loudly pointed out to me, the variables in question were being read via private reflection, or via some unsafe interop scenario, or via code spit into an expression tree or blah blah blah blah blah — some mechanism that was extraneous to the language and therefore not detectable by the compiler. Would I please turn off the damn warning, no we do not care if the warning is correct and consistent with every other “variable not used” warning, it is breaking us now, we’re trying to ship a product here, get rid of it.

    Imagine what would happen had we allowed to get that out into the wild. It wouldn’t just be a few thousand people at Microsoft vexed with me, I tell you what.

    Adding warnings is often a HUGE breaking change. We really don’t want to do it unless we are sure that doing so is not worse than simply living with the bad-smelling behaviour that the warning would be detecting.

    — Eric

  36. Pavel Minaev [MSFT] says:

    > For the errors-as-warnings crowd, they won’t mind a little explicitness

    I, for one, would definitely mind any warning over a _correct_ (i.e. non-escaping) use of an iteration variable in a lambda, or any construct which introduces lambdas under the hood (such as LINQ comprehensions).

  37. Blake says:

    It’s great that you’re taking this problem so seriously. I moved to C# from C++ partly because of the insane amount of this type of baggage in C++.

    In my mind, where the scoping of variables is not immediately obvious (e.g. foreach, query expression syntax), capturing variables by reference is dangerous and should not be the default.

    I know you hate syntactic baggage, but how about a new way of declaring a lambda that defaults to capturing by value. For example, x -> foo(x, y). Additional annotation, similar to the C++, could then allow by reference.

  38. Rayviewer says:

    I have been following this thread. This is great thread that touches many aspects in the software development cycle.

    I have asked this question when the “var” is introduced to the “foreach” loop: Computer is used to eliminate repeated tasks, why we need the "var" in the “foreach (var item in items). After following the thread, I see the good usage of the “var” in the “foreach”.

    I agree with Pavel. This is a bug. “Foreach var” is clear enough to demand a new object in items.

    Historically, Microsoft has resolved a similar issue in C++ a few years back. The issue is in our buddy “for” loop too.

    In the older c/c++ compilers (before VC6?), the following “i” can be used outside the “for” loop.

             for (int i= 0; i<10; i++)

            {

               Do something with i

            }

            fprint(i);  // ok

    However ANSI C++ does not allow the "i" can be referenced out side the "for" loop. In order to compile with ANSI C++ and to keep the backward compatible, the new VC compiler has a compiler flag /Ze for the people who want to keep the above code works.

    I think we can do the same for the foreach issue. Add a check box in the Build tab in the project setting for the people want the old behavior.

  39. David Nelson says:

    "Imagine what would happen had we allowed to get that out into the wild. It wouldn’t just be a few thousand people at Microsoft vexed with me, I tell you what."

    I don’t buy the "people are screaming so we can’t do it" argument. The people who disagree will always scream, but that doesn’t make them right. Yes, some people will have their builds broken; that’s the WHOLE POINT of building with warnings-as-errors, to aggressively find bugs before they reach users. Those people usually understand that the potential for false positives is the unavoidable result of not having an omniscient compiler; but they are willing to pay that price in order to produce a higher quality product. I am sure there were some people who unreasonably insisted that they should be able to have their cake and eat it too: compile with warnings-as-errors AND never have their build break. But imagine how many more people, both immediately and in the future, would be able to find bugs that would have otherwise gone undetected? Why should they suffer because some people who chose to build with warnings-as-errors don’t like the fact that they can’t lazily upgrade from one version to the next?

  40. Dan says:

    Hi, I made the confusing comment before. I had been under the impression that C# closures were dynamic, not lexical, because of this issue and other results from modifying a variable after closing over it. It appears that I was wrong – Pavel’s comment after mine straightened me out about how a call to a closure in a different scope refers to the old scope. I was under the impression that lexical closures acted by closing over the value in the current scope.

  41. Pavel Minaev [MSFT] says:

    > I think we can do the same for the foreach issue. Add a check box in the Build tab in the project setting for the people want the old behavior.

    That compiler option in VC++ has caused untold grief in and of itself, because for a certain time period (roughtly until VC++2005 became common), you never knew if a piece of C++ code you’ve got from somewhere expected to have it turned on or off. It effectively amounts to sneaking in a second, incompatible, dialect of the same language. As a result, many people resorted to a "#define for hack" to force reduced scope regardless of compiler settings:

       #define for

           if(false); else

    In the scenario we’re discussing, the effect of doing the same thing would be even worse, because here the incompatible dialect is _quietly_ incompatible – a program compiled with a wrong setting will compile successfully, it just won’t do the right thing – and there’s no guarantee that "wrong thing" would be an exception and not, say, incorrectly computed result leading to unrecoverable data corruption (when said result is used to update some existing data).

    There is a different take on this, though… I do wish sometimes for more languages, C# included, to provide some form of explicit versioning mechanism, guaranteed to be forward-compatible, that can then be used to select the correct behavior when a breaking change, or a quiet semantics change, is introduced in a new version of the language (which permits such changes to be introduced more often to begin with). For example, in XQuery, a module can start with a version declaration:

       xquery version "1.0";

    and the language spec requires the implementation to only accept a version declaration if it can support the precise semantics mandated by a spec for that version (it can be less restrictive by adding extensions, but it cannot change the meaning of code, or refuse to compile it, if it’s valid for the version specified).

    This allowed them to introduce breaking changes. For example, in XQuery 1.0, "function" isn’t a keyword in expression context, and so this:

       function($x)

    is interpreted as a call to a function named "function", passing the value of variable $x as an argument. In XQuery 1.1, "function" in expression context is a keyword that starts a lambda expression, so the above code would parse it as such, $x as lambda parameter declaration, and then expect { to follow the ) to mark the beginning of lambda body.

    An XQuery implementation can only support 1.1 and refuse 1.0, or support 1.0 and refuse 1.1, or support both; and if I write my code relying on presence or absence of such features (or, in general, when writing any code, assuming future breaking changes), I can mark it with a version declaration, and know that, no matter what, it will either work precisely the way I expected it to do according to the spec, or fail to run if the exact meaning I relied on is not supported by the implementation.

  42. Richard says:

    My £0.01206: (= $0.02)

    Introducing a new syntax is a bad idea. You would either have a new syntax for the new behaviour, which would not be particularly discoverable and would likely remain unused, or a new syntax for the old behaviour, in which case you might as well just make the change and let anyone who needs the old behaviour use a workaround.

    A warning would be nice if it was 100% accurate, but that sounds infeasible. An information message would also need to be 100% accurate to be of any use. If it’s not accurate, it would become the Visual Studio equivalent of the "Are you sure you want to run this virus and let gangsters pwn your data?" dialogs that so many users read as "Click Yes to close this annoying message-thingy and get on with what you were trying to do (probably viewing naughty pictures)".

    Changing the implementation will *probably* not break any code, and certainly looks like the best option, but there are plenty of other changes to the framework which have been rejected because they *might* break code.

    Eg: ColorTranslator.FromHtml throws a System.Exception

    https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=94064

    "We cannot make assumptions about programming practices and therefore we cannot change an outer exception in case deployed application are catching that specific exception."

    Maybe for .NET 5 you should throw caution to the wind, ignore backwards-compatibility, fix any language issues like this, and completely refactor the BCL. You could get rid of any hang-overs from v1, such as non-generic collections; you could move the ExtensionAttribute to mscorlib to allow extension methods anywhere in the framework; you could split System.Web so that MVC apps don’t get a reference to WebForms controls; you could remove some of the duplication between WPF and WinForms.

    Perhaps v5 is optimistic, but if you don’t do something before v10, I fear the framework may become bloated with backwards-compatibility junk, and end up being the VB6 of 2025.

  43. Ronan thibaudau says:

    About refactoring the BCL in .net 5 i totally agree, .NET has evolved very fast and went from sub part to on par to more advanced than most platforms in very little time. I don’t think the people still using non generic collections are the ones who are likely to dig into lamda and dynamic so i think it would be nice to say , for .NET 5, ".NET 4 and before is legacy and maintained for the next 3 years , .NET 5 removes everything that has been made obsolete by newer frameworks". This would be an opportunity to remove a lot of bloat and , if it aint there , newcommers don’t risk using it.

  44. DWalker59 says:

    A better analogy than "A vaccine which kills more healthy people than the disease would have is probably not a good bet" might be "Medicine which kills more people than the disease it’s supposed to cure is not a good bet".  

    Of course, analogies are like feathers on a snake.

  45. A late vote for fixing the foreach statement. The design was a mistake; it’s by far the biggest wart in the language – and if it would do virtually no harm to fix, then please fix it if you can!

    When I first made this error, I was dismayed. Every time I write

    var temp = it; // don’t remove this line

    I’m still dismayed.

    Conversely, I can’t think of a reason anyone would be depending on the current semantics. A look though the comments from your clever readers was, unusually, completely unilluminating.

    Did I miss an example of a reasonable use of the current semantics that the change would break? (For some reasonable interpretation of ‘reasonable’?)

    Ta, Pete.

  46. Dale Force says:

    Actually, with the smallpox virus now confined to one or two labs, there is no general vaccination because the vaccine is now more dangerous than the very small chance of catching the disease (unless you work at one of the labs.)

  47. Kyralessa says:

    I’m quite late to this show (I tend to save and read blogs offline), but my vote is to change it, even if it’s a breaking change.

    The very fact that it’s a breaking change initially made me feel it should stay as-is.  But as Stuart Ballard said, it’s not just unintuitive, but "unuseful".  It’s hard to think of a situation in which the current behavior would be intentionally leveraged.  I think most of us, writing foreach the long way, would intuitively put the loop variable *inside*, not *outside*.

    While I understand the mindset of those who say it’s a matter of training, I can’t agree in this case.  The current behavior isn’t so much a matter of closure as a matter of a quirk in the way foreach works.  (And I’ve been bitten by the quirk at least once, when I assigned several event handlers in a loop using lambda expressions, and then tried to figure out why only one of the handlers ever seemed to get hit.)

  48. Shaftway says:

    I’m even later to the game, but there’s a solution that is elegant, non-breaking, and solves a wider range of problems than the foreach loop itself.  So right now the way you have to implement this is…

    int x = 0

    foreach( i in list ) {

     x += 1;

     int y = x;

     var z = i;

     funcs1.Add( () => z );

     funcs2.Add( () => y );

    }

    What if there was a way to declaratively "pin" the values of i and x, so that when used later they retain the expected value.  For example, this could be rewritten as…

    int x = 0

    foreach( i in list ) {

     x += 1;

     funcs1.Add( () => (!)i );

     funcs2.Add( () => (!)x );

    }

    That gives me a tighter syntax with (I think) a non breaking language change (if it is breaking then pick something that wouldn’t break), I’ve covered the enumerator’s case, I’ve covered the outer variable case, it can be easily turned on and off to support whichever model you want.  As far as I can tell, it’s pure awesomeness.

  49. penartur says:

    By the way, there _already_ is a clear syntax for that "new" behavior.

    If we will consider replacing

    var values = new List<int>() { 100, 110, 120 };

    var funcs = new List<Func<int>>();

    foreach(var v in values)

     funcs.Add( ()=>v );

    foreach(var f in funcs)

     Console.WriteLine(f());

    with

    var values = new List<int>() { 100, 110, 120 };

    var funcs = new List<Func<int>>();

    values.ForEach(v => funcs.Add(() => v));

    funcs.ForEach(f => Console.WriteLine(f()));

    then the resulting program will clearly output 100, then 110, then 120 :)

    So there is another reason to not invent that new syntax :)

  50. Brian says:

    @Penartur: That is a function on lists.  It isn't sufficient if dealing with collections other than lists.

  51. Ken says:

    How can an insect be eaten by more than one frog? 😉