Fun with Iterators and state machines


I recently was given a piece of C# code with statements like:


 


yield return value;


 


and


 


yield break;


 


This was the beginning of my descent into the loopy world of C# 2.0 iterators.  It took me awhile to wrap my head around them, and when I tried to explain them to other team members I got looks of total confusion.  (I admit it, some of us don’t keep the C# 2.0 specification on the night stand. 🙂  If you’re totally familiar with iterators, you can skip this blog entry.  If not, plunge in with me. Also, I’m not trying to give a comprehensive overview of C# iterators.  (For that, see the MSDN article) Rather, I’m focused on the cool compiler and run time magic that makes them possible.


 


<<Disclaimer:  This code is based on the Whidbey Beta 1 version of C#, and various details may change before release.>>


 


Consider the following program:


 


//
// Iterator1.cs – Requires C# 2.0 or later
//
using System;
using System.Collections.Generic;

class SomeContainer
{
     
public IEnumerable<string> MyIterator()
     
{
            yield
return “Foo”;

            yield
return “Bar”;

            yield
return “Baz”;
     
}
}

class foo
{
     
static void  Main()
     
{
            SomeContainer container
= new SomeContainer();
           
           
foreach( string s in container.MyIterator() )
                  Console.WriteLine( s );
     
}
}


 


Compiling the program ( “csc iterator1.cs” ) and running it gives these results:


 


Foo


Bar


Baz


 


If you try to mentally trace the flow of control, you might notice something odd. What the heck does a “yield return” statement do? Looking at the MyIterator method, it seems like there’s two possibilities:


 


1) The CPU executes only part of MyIterator() at any given time. At points, control comes back to the method, and it executes another chunk.


or


2) The entire MyIterator method runs to completion, and all the possible return values are put into a temporary collection.


 


It’s easy enough to test the first hypothesis.  Let’s add some WriteLine calls to MyIterator and see when they appear:


 


//
// Iterator2.cs – Requires C# 2.0 or later
//
using System;
using System.Collections.Generic;

class SomeContainer
{
     
public IEnumerable<string> MyIterator()
     
{
            Console.WriteLine(
“Entering MyIterator” );

            yield
return “Foo”;
            Console.WriteLine(
“After Foo” );

            yield
return “Bar”;
            Console.WriteLine(
“After Bar” );

            yield
return “Baz”;
            Console.WriteLine(
“After Baz” );
     
}
}

class foo
{
     
static void  Main()
     
{
            SomeContainer container
= new SomeContainer();
           
           
foreach( string s in container.MyIterator() )
                  Console.WriteLine( s );
     
}
}


 


Compiling the new program ( “csc iterator2.cs” ) and running it gives these results:


 


Entering MyIterator


Foo


After Foo


Bar


After Bar


Baz


After Baz


 


Hmm…   Based on these results, it seems that the execution path definitely bounces back and forth between the Main() method and the MyIterator() method.  What the heck is going on here?


 


Since I work on the Visual Studio team, I could have asked a few questions, but some times it’s more fun to figure things out on your own.  Luckily, with a tool like ILDASM, it’s easy enough to discern what’s happening.


 


First, let’s recompile with Iterator2.cs with optimizations ( “CSC /O+ iterator2.cs” )  and see what ILDasm shows us:



___[MOD] iterator2.exe
| M A N I F E S T
|___[CLS] SomeContainer
| | .class private auto ansi beforefieldinit
| |___[CLS] <MyIterator>d__0
| | | .class nested private auto ansi sealed beforefieldinit
| | | implements class [mscorlib]System.Collections.Generic.’IEnumerable`1′<string>
| | | implements [mscorlib]System.Collections.IEnumerable
| | | implements class [mscorlib]System.Collections.Generic.’IEnumerator`1′<string>
| | | implements [mscorlib]System.Collections.IEnumerator
| | | implements [mscorlib]System.IDisposable
| | | .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) …
| | |___[FLD] <>1__state : private int32
| | |___[FLD] <>2__current : private string
| | |___[FLD] <>4__this : public class SomeContainer
| | |___[MET] .ctor : void(int32)
| | |___[MET] MoveNext : bool()
| | | System.Collections.Generic.IEnumerable<System.String>.GetEnumerator : class [mscorlib]System.Collections.Generic.’IEnumerator`1′<string>()
| | | System.Collections.Generic.IEnumerator<System.String>.get_Current : string()
| | |___[MET] System.Collections.IEnumerable.GetEnumerator : class [mscorlib]System.Collections.IEnumerator()
| | |___[MET] System.Collections.IEnumerator.Reset : void()
| | |___[MET] System.Collections.IEnumerator.get_Current : object()
| | |___[MET] System.IDisposable.Dispose : void()
| | |___[PTY] ‘System.Collections.Generic.IEnumerator<System.String>.Current’ : instance string()
| | |___[PTY] System.Collections.IEnumerator.Current : instance object()
| |
| |___[MET] .ctor : void()
| | MyIterator : class [mscorlib]System.Collections.Generic.’IEnumerable`1′<string>()
|
|___[CLS] foo
| | .class private auto ansi beforefieldinit
| |___[MET] .ctor : void()
| |___[STM] Main : void()
|


 For our very simple SomeContainer class, the C# compiler sure generated a lot of methods and fields.  Particularly are the variables <>1__state and <>2__current.  They sound suspiciously like parts of a state machine.  A state machine could explain how just portions of a method execute.


 


Let’s dig a little deeper. Expand the MoveNext() method and examine the disassembly (which I’ve added a few extra lines to in order to increase readability):


 


.method private hidebysig newslot virtual final
        instance bool  MoveNext() cil managed
{
.override [mscorlib]System.Collections.IEnumerator::MoveNext
.override  method instance bool class [mscorlib]System.Collections.Generic.’IEnumerator`1′<string>::MoveNext()
// Code size       164 (0xa4)
.maxstack  2
.locals init (int32 V_0)
IL_0000:  ldarg.0
IL_0001:  ldfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_0006:  stloc.0
IL_0007:  ldloc.0
IL_0008:  switch     (
                      IL_0022,
                      IL_0047,
                      IL_006c,
                      IL_0091)
IL_001d:  br          IL_00a2

IL_0022:  ldarg.0
IL_0023:  ldc.i4.m1
IL_0024:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_0029:  ldstr      “Entering MyIterator”
IL_002e:  call       void [mscorlib]System.Console::WriteLine(string)
IL_0033:  ldarg.0
IL_0034:  ldstr      “Foo”
IL_0039:  stfld      string SomeContainer/'<MyIterator>d__0′::'<>2__current’
IL_003e:  ldarg.0
IL_003f:  ldc.i4.1
IL_0040:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_0045:  ldc.i4.1
IL_0046:  ret

IL_0047:  ldarg.0
IL_0048:  ldc.i4.m1
IL_0049:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_004e:  ldstr      “After Foo”
IL_0053:  call       void [mscorlib]System.Console::WriteLine(string)
IL_0058:  ldarg.0
IL_0059:  ldstr      “Bar”
IL_005e:  stfld      string SomeContainer/'<MyIterator>d__0′::'<>2__current’
IL_0063:  ldarg.0
IL_0064:  ldc.i4.2
IL_0065:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_006a:  ldc.i4.1
IL_006b:  ret

IL_006c:  ldarg.0
IL_006d:  ldc.i4.m1
IL_006e:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_0073:  ldstr      “After Bar”
IL_0078:  call       void [mscorlib]System.Console::WriteLine(string)
IL_007d:  ldarg.0
IL_007e:  ldstr      “Baz”
IL_0083:  stfld      string SomeContainer/'<MyIterator>d__0′::'<>2__current’
IL_0088:  ldarg.0
IL_0089:  ldc.i4.3
IL_008a:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_008f:  ldc.i4.1
IL_0090:  ret

IL_0091:  ldarg.0
IL_0092:  ldc.i4.m1
IL_0093:  stfld      int32 SomeContainer/'<MyIterator>d__0′::'<>1__state’
IL_0098:  ldstr      “After Baz”
IL_009d:  call       void [mscorlib]System.Console::WriteLine(string)
IL_00a2:  ldc.i4.0
IL_00a3:  ret
} // end of method ‘<MyIterator>d__0’::MoveNext


The listing breaks up into five distinct blocks.  Some of the key highlights include:


 


·                     IL_0000 – IL_001D: Load the value of the <>1__state field, and branch to one of  five different blocks in the code.  This is consistent with what a state machine will do.


·                     IL_0022 – IL_0024: Store the value “-1” into <>1__state.


·                     IL_0029 – IL_002E: Print “Entering MyIterator”


·                     IL_0033 – IL_0039: Store the string value “Foo” into the <>2__current field.


·                     IL_003E – IL_0040: Store the value “1” into <>1__state.


·                     IL_0045 – IL_0046: Make the method exit with a return value of 1.


 


Looking at another block (starting at IL_0047) we see a similar sequence of instructions.  The most key difference is at IL_0063 – IL_0065, where the <>1__state is set to ‘2″, rather than “1”.   In yet another block it’s set to “3”.  In short, the <>1__state field is a numeric value that keeps track of which state the MoveNext() method is in.  Each time MoveNext() is called, control begins at the top of the code, and branches to the appropriate handler block for that state. At the end of that block, <>1__state is updated to the next state.


 


What I’ve shown here is a rather simple example of C# iterators and how they’re supported by rather sophisticated run time support which takes your code and creates a state machine around it.


 


If you think you’ve got iterators down, try wrapping your head around this slightly more complicated example:


 


//
// Iterator3.cs – Requires C# 2.0 or later
//
using System;
using System.Collections.Generic;

class SomeContainer
{
     
public IEnumerable<string> MyIterator()
     
{
            Console.WriteLine(
“Entering MyIterator” );

           
for ( int i = 0; i < 10; i++ )
           
{
                  yield
return i.ToString();
           
}
     
}
}

class foo
{
     
static void  Main()
     
{
            SomeContainer container
= new SomeContainer();
           
           
foreach( string s in container.MyIterator() )
                  Console.WriteLine( s );
     
}
}


 


Compile it with “csc iterator3.cs” and run the executable.  Pretty nifty, eh?


 


Next time, I’ll show a very cool way of using C#’s built in state machine support to do idle time processing.  For instance, the Visual Studio IDE has a variety of code that needs to execute while the IDE is idle.  Using C# iterators, it’s fairly easy to accomplish this.


Comments (24)

  1. This seems like a higher-level construct of a "standard" NT Fiber. Am I missing something?

  2. R. Andrew Lamonica says:

    Does the method have to return "IEnumerable" or can the "yield return" be used to return other values? I can see where the lack of an Iterator would make it hard for the runtime to know when to restart at the begining of the function but it might be a fun trick to play with.

    B.T.W. I could just check this but I don’t seem to have a version of .NET that supports this construct. Is 2.0 Beta available to the public?

  3. Matt Pietrek says:

    Sanin: When I first looked at Iterators, I also noticed the similarity to Win32 Fibers. However, Fibers aren’t used in the C# implementation.

    Andrew: The Whidbey beta can be downloaded: http://lab.msdn.microsoft.com/vs2005/get/default.aspx#express

  4. AT says:

    An question – Why exceptions try/catch/finaly not allowed with yeild ?

    See also:

    Fibers, and other mad stuff that you shouldn’t use

    http://blogs.msdn.com/greggm/archive/2004/06/07/150298.aspx

  5. Just thought I’d say thanks for the article. I really enjoyed it!

  6. drebin says:

    Interesting….. but I don’t get it. How or why would you ever use something like this???

  7. Amit says:

    It seems that from the ILDASM code, the current state is modified _after_ the code in the yield block executes. This means that any construct that will cause the flow of execution to shift before state modification is either not allowed or will cause problems.

    Isn’t it better to modify the state first and then let the yield block execute?

  8. Timothy says:

    Why not use collections?

  9. It’s interesting to compare these new C# iterators with the very similar ones in Python and Ruby. There’s less code involved in those languages, making the iterators easier to understand.

    To see how they compare, I translated some of the sample C# iterator code to Python and Ruby:

    http://mg.to/2004/08/03/iterator-cpr

  10. Matt Pietrek says:

    Timothy and drebin: The article I linked to in the main text (http://msdn.microsoft.com/msdnmag/issues/04/05/c20/) provides a much better "big picture" view of iterators than I did in this entry.

    My short answer about the usefulness of iterators is that they make it much easier to implement your data as .NET collections that work with "foreach". In the past, implementing the code to support enumerating your data could be quite complicated. Iterators also make it very easy to support multiple ways of iterating over your data (e.g., first to last, last to first, every other item, etc…)

  11. Roshan James says:

    Here are some potentially relevant writeups –

    Implementation of Iterators in C#

    http://pensieve.thinkingms.com/CommentView,guid,fd10bfa8-1aeb-4353-84c8-cd80e418424f.aspx

    Implementation of Closures (Anonymous Methods) in C# 2.0

    http://pensieve.thinkingms.com/CommentView,guid,9fe42970-09e3-44e2-a4d0-32d63139351a.aspx

    These might be relevant too –

    Iterators in Ruby

    http://pensieve.thinkingms.com/CommentView,guid,dd0e61ef-6fb1-4156-9d16-81c20a6aa871.aspx

    Warming up to using Iterators

    http://pensieve.thinkingms.com/CommentView,guid,02da61dd-1e35-4a36-b46b-aa3a605b2ad6.aspx

    The ‘Big Deal’ about Iterators

    http://pensieve.thinkingms.com/CommentView,guid,d355277d-a942-4a50-aa03-d68413fd459c.aspx

    CLR 2.0 Closures and Environment classes that capture arity

    http://pensieve.thinkingms.com/CommentView,guid,92e51a8a-4d8d-4401-beb2-8b68a019d4bb.aspx

    Matt, its surprising that a hardcore systems programming person like you is writing about implementation of managed compilers, very nice 🙂

    cheers

    Roshan James

  12. Paul Ballard says:

    Congratulations, this blog has been featured on TheServerSide.NET

  13. Flier Lu says:

    Ping Back来自:blog.csdn.net

Skip to main content