foreach and performance rules

I was looking at Brad’s blog this morning and I was astounded to find that some people had chosen not to use foreach “Because Rico said so” though they probably didn’t use those exact words.  I found that very troublesome for several reasons so I thought I’d offer some comments here.

Let’s start with my two rules of performance, they are:

Rule #1: Measure
Rule #2: Do Your Homework

And I’m pretty adamant that after that there are no more rules.  It’s just impossible to predict what is or is not a good idea from a performance perspective without actually measuring because performance work is plagued with many secondary effects and importantly there are other tradeoffs besides just raw space/speed in any case.

Lately I’ve been trying to explain this by taking the approach that engineering decisions must be made quantitatively for them to be truly engineering decisions. I think sometimes people are tempted to use an expert’s intuition (e.g. mine) in place of actual measurements, but that’s cheating… I don’t even stand behind my own intuition so certainly nobody else should :)

For those reasons alone “Rico said so” is just a lousy reason to do anything. 

OK, so now we know what isn’t a good way to make a decision like “Do I use foreach?” but that isn’t too helpful.  How should such a decision be made?

Well actually I think Performance Quiz #3 (Warnings and Good Practice section) speaks pretty clearly on this point.  Start by understanding the characteristics your solution needs to have, then make a plan that is substantially likely to have those characteristics, and verify as you go along.  In this case the “RAD” plan is to use foreach and the “classic” plan is to use a more complex “for” construct of some kind.   If iterating over some key data structure is an important part of your process then you’ll want to measure the kind of throughput you can expect from each of the solutions (maybe a quick prototype to measure that).  Use the measurements to guide your plans so that you can add complexity to your solution only when it is giving you excellent value. 

If you did this you would find that there is no penalty at all for using foreach on arrays for instance and you might find the penalty for using foreach on ArrayList to be so small in your cases that the decreased chance of bugs on that path is the way to go.  On the other hand you might find that you are creating far too many enumerators because of your usage pattern and something more complicated, but cheaper, is called for.  In any case you'll be making an objective decision based on real data.

The thing you must remember is that following a variety of anecdotal performance rules like “don’t use foreach” (which I don’t even believe in, much less advise) is not a substitute for good performance engineering.  You’re much more likely to make a bunch of premature optimizations on that path – when what you needed was performance planning.