More on PerfProductivity: The devil is in the details...

Article
06/13/2004

I enjoyed reading (and forwarding) the comments on my recent post on the productivity performance tradeoff… Many of you (rightly so) pointed out that the details really mater in such a discussion. So I picked one that we are struggling with right now. And I think it is generally interesting to see how enumerator works in List<T>…

Productivity:

List<T> maintains the one of the same invariants that ArrayList offers: You are not allowed to modify the list while it is being enumerated. We do this in order to cheaply provide a level of error checking. It is too expensive for List<T> to be resilient to such changes so we do just enough runtime checks to ensure the invariant is enforced. The common scenario this helps with is:

List<string> l = new List<string>();

l.Add("Joel");

l.Add("KCwalina");

l.Add("Kit");

l.Add("Peter");

l.Add("Joel");

l.Add("Ahmed");

l.Add("Joe");

foreach (string name in l)

{

if (name.Contains("K"))

{

l.Remove(name); //This is not allowed

}

With the checks you get an InvalidOperationException in the loop iteration right after Remove is called, without the checks you would get unpredictable results based on if you have enumerated passed the element or not. These are very subtle bugs to find. If you have ever gotten this exception you can appreciate that it is easy to get into this case without meaning to. As a test I took out the check that enforces this invariant and I was surprised that I got a NullReferenceException on the 7^th (last) iteration through the loop on the if-branch. Any idea why? I for one would find this a subtle bug to go find-and-fix.

Performance:

Of course, this check comes at a cost. To implement this invariant we use a “version stamp” on each instance of List<T>… Every time you modify the list we increment the version stamp. When you get the enumerator (or the foreach statement does it on our behalf) the enumerator stores that version stamp away. Then each time you (or the foreach statement on your behalf) calls MoveNext() the snapped version stamp is checked against the current version stored in the list being enumerated. If they are different we throw. Here is the code for MoveNext() from the List<T> enumerator:

public bool MoveNext()

{

if (version != list._version) throw new InvalidOperationException();

if (index == endIndex)

return false;

index++;

return true;

}

That is an extra if-branch each time through the loop. For a small number of items the overhead of setting up the loop dwarfs this cost, but for large list this check does show up, although I have never seen it show up top on the list in a real-world scenario.

Feedback

So, what do you think? Does the benefit of checking the invariant at runtime outweigh its cost? If we were able to do this check only if you are running with the debugger enabled in VS would that be good-enough or is there value in having this check always on?

More on PerfProductivity: The devil is in the details...

Additional resources