You need to be careful about how you use count in for-loops


Negotiating

Lets consider the following code

 MyCollection myCol = new MyCollection();
 myCol.AddRange(new int[] { 1, 2, 3, 4, 5, });
 for (int i = 0; i < myCol.Count; ++i)
 {
     Console.WriteLine("{0}", i);
 }

What most people forgets to consider is the condition check that happens for the for loop (in Red). In this case it is actually a call to a method get_Count. The compiler doesn’t optimize this away (even when inlined)!! Since the condition is evaluated after each iteration, there’s actually a method call happening each time. For most .NET code it is low-cost because ultimately it’s just a small method with count being returned from an instance variable (if inlined the method call overhead also goes away). Something like this is typically written for count.

public int Count
{
    get
    {  
        return this.count;
    }
}

Someone wrote something very similar in C, where in the for-loop he used strlen. Which means that the following code actually is O(n2) because each time you are re-calculating the length…

for(int i = 0; i < strlen(str); ++i) {...}

Why the compiler cannot optimize it is another question. For one it’s a method call and it is difficult to predict whether there’s going to be any side-effect. So optimizing the call away can have other disastrous effect.

So store the count and re-use it if you know that the count is not going to change…


Comments (3)

  1. Peter Ritchie says:

    Compilers can’t optimize this, the way you’re suggesting, in the general case.  It’s not that they don’t.

    Compiler just can’t be sure the checked result won’t change.  Sure, get_Count could be inlined (depending on whether is virtual, how complex it is, takes struct arguments, etc).  But, the compiler still cannot know if it "optimize" use of get_Count.  Let’s take this example:

    public class MyClass {

     private List<int> list = new List<int>(10000);

     public void FirstMethod() {

       for(int i = 0; i < list.Count; ++i) {

         DoSomething(list[i]);

       }

     }

     private void DoSomething() {

       if(i % 100) list.Add(new Random().Next());

     }

    }

    A pedantic example, but there exists the possibility that something within the loop executes something that changes the length of the array being iterated over, or even changes the reference to the collection.  Hoisting the invariant would no cause different (wrong) results.

    The use of the invariant within the loop is a common pattern that is used, and the compiler optimizes for this by reducing really expensive bounds checks.  When you hoist the invariant (Count) out of the loop you’re actually causing the compiler to be unable to apply this optimization of this case and your code will then be *slower*.  In test of running code that doesn’t hoist and code that does, over 5 execution the average speed difference (.NET 2.0) showed that hoisting the invariant, as you’re suggesting, makes the code 33% *slower*.

    Foreach is that much better because, in the case of the collection possibly changing, it’s working with a copy of the reference. If the reference changes, it doesn’t matter because you’re still iterating the original reference.  The for-loop syntax inadvertently supports iterating over multiple collections.  Plus using the enumerator pattern is that much more optimizable.  Foreach is faster; but not comparable to difference between hoisting and not hoisting the invariant, it’s orders of magnitude faster.  In the same tests, for example, it’s 135 times faster than a for loop (without hoisting)–that’s 13500% faster.

    So, in this case I would disagree with your final line (the advice).  Do not hoist the variable out of the for-loop because of this side-effect, read the various age-old documentation on this (like Bill Wagner’s Effective C#, Item 11–pay particular attention to hoisting the invariant) and prefer foreach.

    But, prefer foreach in the general case.  Don’t prematurely optimize.  If you already have a for-loop, don’t change it to a foreach loop under the guise of increase in execution speed without first A> knowing it will be faster and that amount faster is demanded by the requirements, and B> it will still work.

  2. grauenwolf says:

    Ultimately this comes down to C# using C’s syntax instead of having a true upper bound for loops. Consider the VB version:

    For i = 0 To myCol.Count – 1

    Console.WriteLine(i)

    Next

    VB will never call myCol.Count more than once, so this kind of performance bug isn’t possible.

  3. abhinaba,

    thanks again for sharing your insights…sometimes you forget about the simplest things! take care – jp