The implementation of anonymous methods in C# and its consequences (part 3)


Last time we saw how the implementation details of anonymous methods can make themselves visible when you start taking a delegate apart by looking at its Target and Method. This time, we'll see how an innocuous code change can result in disaster due to anonymous methods.

Occasionally, I see people arguing over where local variables should be declared. The "decentralists" believe that variables should be declared as close to their point of first use as possible:

void MyFunc1()
{
 ...
 for (int i = 0; i < 10; i++) {
  string s = i.ToString();
  ...
 }
 ...
}

On the other hand, the "consolidators" believe that local variables should be declared outside of loops.

void MyFunc2()
{
 ...
 string s;
 for (int i = 0; i < 10; i++) {
  s = i.ToString();
  ...
 }
 ...
}

The "consolidators" argue that hoisting the variable s means that the compiler only has to create the variable once, at function entry, rather than each time through the loop.

As a result, you can find yourself caught in a struggle between the "decentralists" and the "consolidators" as members of each school touch a piece of code and "fix" the local variable declarations to suit their style.

And then there are the "peacemakers" who step in and say, "Look, it doesn't matter. Can't we all just get along?"

While I admire the desire to have everyone get along, the claim that it doesn't matter is unfortunately not always true. Let's stick some nasty code in where the dots are:

delegate void MyDelegate();
void MyFunc1()
{
 MyDelegate d = null;
 for (int i = 0; i < 10; i++) {
  string s = i.ToString();
  d += delegate() {
   System.Console.WriteLine(s);
  };
 }
 d();
}

Since the s variable is declared inside the loop, each iteration of the loop gets its own copy of s, which means that each delegate gets its own copy of s. The first time through the loop, an s is created with the value "0" and that s is used by the first delegate. The second time through the loop, a new s is created with the value "1", and that new s is used by the second delegate. The result of this code fragment is ten delegates, each of which prints a different number from 0 to 9.

Now, a "consolidator" looks at this code and says, "How inefficient, creating a new s each time through the loop. I shall hoist it and bask in the accolades of my countrymen."

delegate void MyDelegate();
void MyFunc2()
{
 MyDelegate d = null;
 string s;
 for (int i = 0; i < 10; i++) {
  s = i.ToString();
  d += delegate() {
   System.Console.WriteLine(s);
  };
 }
 d();
}

If you run this fragment, you get different behavior. A single s variable is created for all the loop iterations to share. The first time through the loop, the value of s is "0", and then the first delegate is created. The second loop iteration changes the value of s to "1" before creating the second delegate. Repeat for the remaining eight delegates, and at the end of the loop, the value of s is "9", and ten delegates have been added to d. When d is invoked, all the delegates print the value of the s variable, which they are sharing and which has the value "9". The result: 9 is printed ten times.

Now, I happen to have constructed this scenario to make the "consolidators" look bad, but I could also have written it to make the "decentralists" look bad for pushing a variable declaration into a loop scope when it should have remained outside. (All you have to do is read the above scenario in reverse.)

The point of this little exercise is that when a "consolidator" or a "decentralist" goes through an entire program "tuning up" the declarations of local variables, the result can be a broken program, even though the person making the change was convinced that their change "had no effect; I was just making the code prettier / more efficient".

What's the conclusion here?

Write what you mean and mean what you write. If the precise scope of a variable is important, make sure to comment it as such so that somebody won't mess it up in a "clean-up" pass over your program. If there are two ways of writing the same thing, then write the one that is more maintainable. And if you feel that one method is superior from a performance point of view, then (1) make sure you're right, and (2) make sure it matters.

Update: In C# 5, the rules for the foreach statement changed in a way that affects lambda capture: The control variable of the foreach is now scoped to the loop body, which means that capturing it in a lambda captures the current iteration, because each iteration gets a separate copy of th3e variable. That doesn't affect our for loop above, but it is worth calling out.

Comments (38)
  1. And (3) document in the code why you made this choice so that a future maintainer doesn’t come along and undo your brilliant work.

  2. Adam says:

    Taking delegates out of the picture for now – why would there be a performance difference between the two. Shouldn’t a halfway decent compiler (i.e. the C# compiler) generate identical code (with the possible exception of the order in which “i” and “s” appear on the stack) for both of your first two loops?

    If not, shouldn’t the magic of JIT compilers that allows dynamically profiled managed code to run faster than native code eliminate any differences that remain?

    I always thought that performance was a complete non-argument for either side when choosing where to declare variables.

    [You’d be surprised how many arguments there are over non-arguments. -Raymond]
  3. Cody says:

    Good question, Adam.  To extend this, why wouldn’t the compiler, if it does optimize by moving the declaration outside of the loop, break the code in the delegate case?

    Are most compilers naive in this performance aspect and would not optimize, are they naive and would optimize such that it breaks the delegate, or are they intelligent enough to realize the difference?

    [Obviously a compiler’s optimizer can’t make this change because it would violate the language’s specification. I can’t believe I had to write that. -Raymond]
  4. Gabe says:

    When s is a local variable, it really doesn’t matter where it is declared because the amount of stack space a function needs is allocated at function entry. In the case where it is used in an anonymous delegate, though, it is not a local variable anymore; it is a variable that is shared between the method in which it was declared and the anonymous delegate. This means it is actually declared in a heap-allocated data structure (probably a class in C#), and that structure creation is not something the compiler will optimize.

    Of course there is still a local variable, but in this case it’s a reference to the structure containing all of the variables shared by the method and the delegate.

  5. Phill says:

    Can I start by saying that as a .NET developer I’ve really enjoyed the posts this week. Raymond has a great way of explaining things and while a lot of his Win32 posts are outside my understanding I enjoy reading them for the insight into Windows internals.

    I didn’t realise the danger of anonymous methods in C#, I’ve only just moved to version 2.0 of the framework and haven’t used them yet. I must say coming from a Java background the restriction of only allowing final local variables in anonymous inner classes certainly seems more predictable and safer.

  6. lf says:

    … and this is way Java requires such magically-long-lived local variables to be explicitly final when used from an inner class, anonymous or otherwise:

    for(final int i : someStuff)

       bunchOfDelegates.new Delegate() { public void foo() { print(i); } };

  7. Fox Cutter says:

    I think the moral of the story is not to use local variabels in you anonymous methods if the method will live longer then the scope of the function it is in.

  8. you should really consider doing this C# things more often

    thanks Raymond

  9. Peter Ritchie says:

    Geez, I thought they were anonymous because they didn’t have a name.  It turns out they’re anonymous because they don’t want to take credit for potential bugs… :-)

    Michael Grier: Wholeheartedly agree.  From the "if it’s not obvious, document it" addendum to the "write self-documenting code" manual.

    Eber Irigoyen: Agreed!

  10. Barry Kelly says:

    Of course, the fact that the variable is hoisted by a delegate drastically changes its nature – it’s going to become an instance field on a new compiler-generate class.

    For normal variables, which are left as normal variables by the C# compiler (i.e. aren’t hoisted), the scope information is completely lost (save for debug info in the PDB) by the time the translation to IL has been made. That is, by looking at the IL, there’s no way to tell if the variable has been "consolidated" or "decentralized".

    It’s the JIT compiler’s job to apply use-def and def-use analysis to determine the actual lifetime of the variable, because there’s nothing in the IL that scopes it, other than the uses and definitions themselves.

  11. Dave says:

    "In the case where it is used in an anonymous delegate, though, it is not a local variable anymore … it is actually declared in a heap-allocated data structure (probably a class in C#), and that structure creation is not something the compiler will optimize."

    True, yet it could potentially take a detailed inspection to determine that. You certainly can’t tell from the declaration of s that it’s special in any way, yet it clearly has a special scope and semantics. To me, the lack of some "this is special" tag on the declaration of s is an accident waiting to happen.

    I love closures and anonymous functions, and use them all the time in Javascript. Sometimes I create them by accident and it takes a while to figure why I am "leaking memory". With a static-typed language like C# I’m really surprised that there are no more safeguards than offered by Javascript.

    Those of you who say this is a documentation issue are right–the C# type system should be providing that documentation. Look at the heroics it already performs in this case; would it be too much to ask the programmer to do their part by adorning "string s" a bit more? If the doc is just human readable comments then there’s no way for the compiler to have extended warnings about closures, for example. How will it know when you intended to do it versus when you accidentally did it?

  12. …and anyone moving from a language with variable-name scoping rules similar to those in C# will surely get caught in one of the biggest gotchas of using closures in JavaScript. Namely, regardless of where a "var foo;" statement is in JavaScript, the variable is scoped to the nearest-enclosing function. (i.e., scoped to the function, not to the block).

    This leads to fun broken code like:

    for (var i = 0; i < a.length; i++)

     doSomething(function() { alert(i); });

    at which point we scratch our heads when every invocation of the callback alerts the same value (a.length, assuming the callbacks are invokved after the for loop has finished).

    The fix is to use an extra function to create a new scope in which to capture the value of the loop variable at the time the callback is created:

    for (var i = 0; i < a.length; i++)

     doSomething(

       (function(x) { return

         function() { alert(x); };

        })(i)

     );

    (This has been written about many, many times in the past, but still regularly trips up JavaScript programmers. So it always deserves  repeating, even in not-quite-the-right forum. :-)

    Lee

  13. Nicole DesRosiers says:

    This is a good reason why, if you are working in someone else’s code, you should write to their style.  Even if you think their style is the worst possible style to write in, trying to rewrite the function to your style just to fix a bug, or writing your new code in your style and leaving the rest in the other style, is just a recipe for disaster.

  14. BryanK says:

    Right, scope != lifetime (or not necessarily, anyway).

    In VB.net 1, for instance, code like this:

    For i as Integer = 0 to 9

     Dim x As Integer

     x += i

     MsgBox(x.ToString())

    Next i

    gives "45" in its final message box.  Code like this, however:

    For i as Integer = 0 to 9

     Dim x As Integer = 0

     x += i

     MsgBox(x.ToString())

    Next i

    gives "9" in its final message box, because the variable x is reinitialized (by the Dim statement) every time through the loop.  All loop iterations share the same variable instance, though (it has function lifetime).

    Even though VB.net 1 specifies that new variables will have a default value (for Integers, that value is zero), that guarantee only applies at the start of the variable’s lifetime, not its scope.  If you need it to be reinitialized to zero each time through the loop, you need to write code to do that yourself.

    If C# 2 has the same variable-lifetime spec, then both sets of code are equivalent; both will print the character "9" ten times when the delegate is invoked.  I do not know if this is the case though; I would suspect that Raymond has tried the code he posted, and verified that it was indeed different, so I would suspect that the lifetime of C# 2 variables is the same as their scope.  It may also be true that in *all* .Net Framework 2.0 languages, the lifetime and scope of all variables is the same (i.e. they may have changed the spec), but I don’t know that for sure either.

  15. Cooney says:

    Dear god, that’s awful. Your delegate takes a reference to a variable instead of a copy, as it would in any proper implementation. You should be able to change s at your pleasure after setting up the delegate, or else it’s just asking for problems. Yeah, copying is a pain when s is really big, but that’s what explicit references are for.

  16. Daev says:

    Cody asks, "To extend this, why wouldn’t the compiler, if it does optimize by moving the declaration outside of the loop, break the code in the delegate case?"

    You’re making a mistake that’s not too uncommon.  Compiler optimizers don’t really optimize *your code*; they don’t rewrite and transform the source code you wrote into something more efficient.

    They optimize the assembly language — or rather, stuff that’s most of the way towards being the final binary compiler output, whatever low-level language that might be.

    So any complicated maneuvers like this "delegation" have already happened by the time the optimizer gets its hands on the primitive sequential instructions (register-transfer language, intermediate three-address code, control-flow graph, whatever you call it).

    There was a reply last month, in the discussion of how a compiler calls an imported DLL function, that made the same mistake:  thinking the compiler "changes your code," like a macro preprocessor or refactoring tool, to produce the transformed source code that it then really compiles.

  17. As the guy that wrote the code and worked through a lot of these issues with teh language designers, I’d like to add a few thoughts.

    First, thanks Raymond for clarifying what I thought I covered in excruciating detail previously (http://blogs.msdn.com/grantri/archive/category/3378.aspx).

    Second, all those who want the JIT to produce more optimized code that solves all of their in-effeciencies, please consult with the people that want the JIT to be faster.  We (Microsoft) thought we picked a good balance, but if you can come up with a better one, please let us know, but be prepared to support your reasoning.

    Third, the JIT/runtime is constrained by the MSIL and metadata emitted by the C# compiler.  It can only optimize things so much.  Namely it cannot change a field into a local, or vise versa.  Therefore once the C# compiler is done, regardless of what the C# language spec allows or disallows, the runtime/JIT must follow it’s rules.  This is commonly implemented by the C# compiler emitting MSIL and metadata that forces the runtime to be at least as restrictive as the original source code and C# language spec.  "At least as restrictive" means that sometimes in order to maintain correct code according to the C# language the compiler emites MSIL or metadata that causes the runtime to be more restrictive.

    Fourth, contrary to popular belief the C# compiler doesn’t really optimize your code.  It does a little cleanup.  Some future version might do more, but for now, it mostly jsut translates source code into MSIL.  That’s generally OK, because if performance matters you can generally tweak up the source to get better performance.  With anonymous methods there are some really subtle issues with lifetime and scoping that can have performance implications.  Since we don’t have an optimizer that could fix any of these issues, I took the second best route, document, document, and document all of the subtlties so that programmers could understand them and use them to tweak up the source to get better performance.

    Fifth, anonymous methods do introduce some new issues with scopign and lifetime, but you should as a programmer know the language you are using. BryanK pointed ouut some interesting facts that would fly in the face of most C++/C# programmers.  Anonymous methods have the potential to change the lifetime of a local, but *NEVER* the scope.  Because the lifetime can now be longer you can write programs that can detect the fact that since v1 of C# each entry into a scope begins *NEW* lifetimes for variables declared in that scope.  This is obvious in languages that allow nested functions, but is a new concept to existing C# (and C++) programmers who have unknowingly been following this rule, but could never observe the differences before anonymous methods.

    This comment was provided "as is" and is just my opinion and memories from when I worked on the C# compiler and now the CLR JIT.

    –Grant

  18. John Walker says:

    I agree with Eber…Any chance of extending .NET week to 2 weeks? I feel cheated since the first couple of days of the week didn’t have many .NET posts ;) Seriously, this is really interesting stuff that I’ve often wondered about. Thanks.

  19. Neil says:

    So, either a) the code change really isn’t innocuous or b) the code change really is innocuous and the language shouldn’t have been specified to make innocuous changes so damaging ;-),

    [I’m pretty sure you can find similar examples of “dangerous but innocuous-looking” code changes in any language. C++ object creation and destruction is full of subtleties. C++ temporary object lifetime is definitely pretty high on my list. -Raymond]
  20. Factory says:

    Hmm whilst I agree that innocuous changes can cause serious bugs, I would disagree that it is not a good idea to change code merely because it might cause a bug.

     OTOH methinks that the "consolidator" in this example is in the wrong. Since the consolidator’s justification is basically optimization, and in the context given it is premature optimization, which is the Root of All Evil ™. The Decentralizer’s justification is aimed at reducing all those other side effects that can (and often do) happen which you use a variable for more than one purpose. Which is a good thing, IMHO.

     But really, the one thing that needs to be said about this is example is that one should verify that any code changes one makes actually have their intended effect, regardless of whether you are a consolidator or a decentralizer.

  21. Judah says:

    Interesting stuff, Raymond. I agree with the others, it’s nice to see a .NET-related post once in awhile.

    On the topic of C# 2 anonymous methods, I’ll say that I haven’t yet run into any problems with them, and I’ve been playing with them since beta 1 of .NET framework v2. That’s not to say there no caveats and gotchas, but rather, typical usage–at least in my experience–tends to be safe and problem free.

  22. Norman Diamond says:

    40 years of backwards compatibility ^_^

    IBM changed a PL/I compiler for the values of variables that would be seen by a function that was called via a function pointer.  The original object code did something like the 9 9 9 … 9 version and the modified object code did thunking to provide something like the 1 2 3 … 9 version.  I’m not sure if they changed the spec at the same time, or if someone decided that the spec actually required the thunking.  To me the spec didn’t seem to prohibit either interpretation.

    > [Obviously a compiler’s optimizer can’t make
    > this change because it would violate the
    > language’s specification. […] -Raymond]

    Two exceptions:
    (1) If the compiler’s user sets an option for unsafe optimizations.
    (2) If the compiler’s bug is caught an insufficient number of months before release so the bug gets resolved as “won’t fix” or as “won’t release a hotfix”.

    By the way the famous quote about regular expressions can be applied to the way some language designers mungle the combination of objects and pointers.  When seeing a problem where unskilled programmers have trouble understanding pointers, some language designers say “I know what, I’ll hide the pointer as a reference inside an object and I won’t let give the programmer any control over the reference”.  Then they have two problems.

    [Ah the pedantic police are out again, I see. Allow me to clarify: “Obviously a compiler’s optimizer can’t make this change on purpose without the user’s permission…”. -Raymond]
  23. steveg says:

    Michale Grier: And (3) document in the code why you made this choice so that a future maintainer doesn’t come along and undo your brilliant work.

    Gosh, that’s pretty controversial, doncha think? How does one classify a feature as obscure and therefore worthy of commentary for the unwary?

    I’ve worked on a few projects where my comments explaining things have been removed because “it’s obvious”. Very frustrating sometimes (an example that comes to mind was a very complex bit of Perl — fun to write, terrible to maintain).

    If a feature is known from the design stage to be a likely trouble spot, is it worth adding messages to the compiler, or perhaps adding an extra keyword or two? Syntactic sugar can be useful.

    [Most people do not have the luxury of being able to modify the language they are programming in. A comment is the best you can do. Either that or set a house policy of not using whatever feature it is that troubles you. -Raymond]
  24. Michael says:

    "Ah the pedantic police are out again, I see. Allow me to clarify: ‘Obviously a compiler’s optimizer can’t make this change on purpose without the user’s permission…’. -Raymond"

    Interesting that the most pedantic blogger I know would make such a statement ;)

  25. steveg says:

    [Most people do not have the luxury of being able to modify the language they are programming in. A comment is the best you can do. Either that or set a house policy of not using whatever feature it is that troubles you. -Raymond]

    Yep, but how does one know what deserves commentary and what doesn’t? What you consider obscure in C++ probably isn’t what a newbie would think.

    I probably should’ve added <rhetorical> tags :-)

  26. Raymond wrote a really nice series of posts on this:

    Part 1

    Part 2

    Part 3

    He also points out that…

  27. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  28. This post assumes that you understand how closures are implemented in C#. They’re implemented in essentially

  29. One of my favorite new features for Code Analysis in Visual Studio 2008 is our support for analyzing

  30. On the advice of Jay Wren , I decided to try our ReSharper 4.1 .&#160; I had previously installed DevExpress

  31. On the advice of Jay Wren , I decided to try our ReSharper 4.1 .&#160; I had previously installed DevExpress&#39;

Comments are closed.