Follow up on enabling a faster foreach


style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Wow – several hours of performance
training for the CLR team today… One thing that came up was how to enabling
writing fast foreach code by doing a better job of implementing IEnumerable…
 Apparently a while back there was
some code that need to be super fast (think default winproc here) that did
nested foreach’s on a couple of ArrayLists… "urn:schemas-microsoft-com:office:office" />


style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">Some thing
like:


style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: Arial">ArrayList
l1 = … //put n items in style="mso-spacerun: yes"> 


style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: Arial">ArrayList
l2 =…


style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: Arial">foreach
(object o in l1){


style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: Arial"> style="mso-spacerun: yes">   foreach (object o in l2)
{


style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: Arial"> style="mso-spacerun: yes">   }


style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-bidi-font-family: Arial">}


style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 


style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">As you know, the way foreach works
is that it allocates an Enumerator… so that means the inner loop is causes n
enumerators to be allocated.  Not a
big problem for a normal app, but for this often accessed part of the system it
meant over a 1/3 of the memory usage for a sample app was Enumerators. style="mso-spacerun: yes"> Wow.


style="FONT-SIZE: 10pt; FONT-FAMILY: Arial"> 


style="FONT-SIZE: 10pt; FONT-FAMILY: Arial">We certainly learned our lesson in
the ArrayList class.  If we had it
to do over we’d follow the guidelines href="http://blogs.gotdotnet.com/BradA/PermaLink.aspx/fa3784dd-5566-4254-ba77-cb2445ac11b7">I
outlined earlier and in addition make sure the Current property is
inlineable.  Oh wait, we DO have it
to do over again.. sort of… In Whidbey will be adding Generic versions of the
collections, and you can bet we addressed this problem
there!


 

Comments (3)

  1. Frank Hileman says:

    I think anything very low level, such as a generic collection, can not allocate objects on every call as part of normal operation. I have noticed two other low-level places that persuade the programmer to allocate objects unnecessarily: delegate remove, and the recommended event design pattern.

    In the case of delegates, by convention one allocates a delagate to remove a delegate from a list of delegates. A static delegate can be used for this purpose, but this is not the convention. If the delegate remove function were overloaded, and could take as a parameter a different type of object, say a MethodInfo, there would be no need for the delegate allocation, and the syntax for hooking/unhooking events in C# could be simplified: button.Click -= button_Click;

    It would seem that removing a delegate is not done frequently, but this is not always the case. For example, the Application.Idle event must be unhooked as soon as it is fired (to prevent endless firing). So if you use this frequently, it must be hooked/unhooked all the time.

    The recommended event design pattern persuades the programmer to allocate an event args object before the event is even fired. The recommended pattern is to provide a virtual OnEventName function which takes an EvenNameArgs parameter as the second argument. Consequently, even if nothing is hooked to the event, the programmer must allocate the arg so that the OnEventName function can be called. If the OnEventName function is not overridden, and the event is null, the allocation/initialization is wasted. A static EventNameArgs object can be reused for this purpose, but this limits multithreading unless synchronization is also accounted for, and prevents recursion.

    Generally speaking, if events are potentiallly fired frequently, the recommended design pattern can cause a lot of wasted objects and CPU time, even when no events are acually fired. One solution to the allocation problem is to eliminate the convention of using an EventNameArgs object, and instead put the data right in the parameters of the delegate. In most cases there is little data passed. I think the event design pattern should be changed, so that events can be fired without allocating a single object, without making the code complicated.

  2. Brad Abrams says: