Another tidbit regarding optimizations

Since today's theme seems to be optimization (see Sam's blow-by-blow), here is a bit about index-based loops vs. for-each loops. Everyone knows that index-based loops are faster, but why and by how much?  

A friend (Hi Tamar!) asked me why the following 2 snippets behaved differently while debugging:

#1:
foreach(string s in myStringArray)
{
      if (s=="b") break;
}

#2:
foreach(string s in myArrayList)
{
      if (s=="b") break;
}

When debugging line-by-line, Snippet #1 executes the break and then jumps to the code after the loop, while snippet #2 executes the break, then returns to the foreach line one more time before leaving the loop. 

It seemed obvious that the compiler was doing some sort of optimization on the "native" array vs the arraylist object, but to know exactly what was going on I had to peek into the generated IL. Here's how it looks under the covers:

Snippet #1:

  IL_0003:  stloc.2
  IL_0004:  br.s       IL_001d
  IL_0006:  ldloc.1
  IL_0007:  ldloc.2
  IL_0008:  ldelem.ref
  IL_0009:  stloc.0
  IL_000a:  ldloc.0
  IL_000b:  ldstr      "b"
  IL_0010:  call       bool [mscorlib]System.String::op_Equality(string,
                                                                 string)
  IL_0015:  brfalse.s  IL_0019
  IL_0017:  br.s       IL_0023
  IL_0019:  ldloc.2
  IL_001a:  ldc.i4.1
  IL_001b:  add
  IL_001c:  stloc.2
  IL_001d:  ldloc.2
  IL_001e:  ldloc.1
  IL_001f:  ldlen
  IL_0020:  conv.i4
  IL_0021:  blt.s      IL_0006
  IL_0023:  ret

Notice lines IL_0010 - IL_0017: compare the string, if false skip to line 19 (which continues the loop), else branch to line 23, which is outside the loop.

Now check out Snippet #2:

.try
  {
    IL_0007:  br.s       IL_0024
    IL_0009:  ldloc.1
    IL_000a:  callvirt   instance object [mscorlib]System.Collections.IEnumerator::get_Current()
    IL_000f:  castclass  [mscorlib]System.String
    IL_0014:  stloc.0
    IL_0015:  ldloc.0
    IL_0016:  ldstr      "b"
    IL_001b:  call       bool [mscorlib]System.String::op_Equality(string,
                                                                   string)
    IL_0020:  brfalse.s  IL_0024
    IL_0022:  br.s       IL_002c
    IL_0024:  ldloc.1
    IL_0025:  callvirt   instance bool [mscorlib]System.Collections.IEnumerator::MoveNext()
    IL_002a:  brtrue.s   IL_0009
    IL_002c:  leave.s    IL_003f
  }  // end .try
  finally
  {
    IL_002e:  ldloc.1
    IL_002f:  isinst     [mscorlib]System.IDisposable
    IL_0034:  stloc.2
    IL_0035:  ldloc.2
    IL_0036:  brfalse.s  IL_003e
    IL_0038:  ldloc.2
    IL_0039:  callvirt   instance void [mscorlib]System.IDisposable::Dispose()
    IL_003e:  endfinally
  }  // end handler
  IL_003f:  ...

Now the code is a bit more complex, as the loop is instantiating an IEnumerator and also has a try/finally clause. The inner loop (lines IL_001B-IL_0022) behaves similarly to Snippet #1 (except for using the IEnumerator ), but now when encountering the break it still has to go through the "finally" block which includes one last call to dispose. This is what causes the debugger to go back to the "foreach" line before exiting the loop.

So, from the above code you can see that although both snippets use the for-each syntax, for native arrays the C# compiler optimizes the loop to be just as efficient as an index based loop. However with anything more complex the compiler follows the IEnumerable interface to the letter which results in the overhead of the instantiation, try/finally and Dispose call.

And just to prove the point, here is the IL generated when looping through an ArrayList by index:

 IL_0000:  ldc.i4.0
  IL_0001:  stloc.0
  IL_0002:  br.s       IL_0022
  IL_0004:  ldarg.0
  IL_0005:  ldloc.0
  IL_0006:  callvirt   instance object [mscorlib]System.Collections.ArrayList::get_Item(int32)
  IL_000b:  callvirt   instance string [mscorlib]System.Object::ToString()
  IL_0010:  ldstr      "b"
  IL_0015:  call       bool [mscorlib]System.String::op_Equality(string,
                                                                 string)
  IL_001a:  brfalse.s  IL_001e
  IL_001c:  br.s       IL_002b
  IL_001e:  ldloc.0
  IL_001f:  ldc.i4.1
  IL_0020:  add
  IL_0021:  stloc.0
  IL_0022:  ldloc.0
  IL_0023:  ldarg.0
  IL_0024:  callvirt   instance int32 [mscorlib]System.Collections.ArrayList::get_Count()
  IL_0029:  blt.s      IL_0004
  IL_002b:  ...

This should be compared to Snippet #2's IL. You can see how much simpler the index-based loop is. So, now we know why using "for-each" is slower then using indexed arrays. But (and this was news to me) when using native arrays the difference is neglible.