JIT optimizations on for loop with ar.Length constraint


A reader wrote in recently and asked me why the non-caching version of the following code was faster than the caching version.  This goes against the intuition of any longtime C\C++ developer.  The reader found Mark’s blog that confirmed his findings, but wanted to know *why* it was true…   


 


//caching example


void m(Object[] ar) {


    int len = ar.Length; // cache length


    for (int i = 0; i < len; i++) {


        // do something with ar[i]


    }


}


 


 


//non-caching example


void m(Object[] ar) {


    for (int i = 0; i < ar.Length; i++) {


        // do something with ar[i]


    }


}


 


The time for a loop like this is often noticeably affected by the time it takes to do an array bounds check each time you index into the array.  This is an important feature of managed code as it greatly decreases the risk of common security vulnerabilities known as buffer overflows. 


 


But if you observe the code carefully, you will see that the checks are redundant.  The loop itself starts at 0 (the low bound of the array) and stops before the end of the array.  So there is no need to check those conditions again.  If we recognize the pattern, the engine can check just once at the top of the loop to be sure 0 and ar.Length are within bounds of the array.  This is a process known as hoisting.  That is we hoist or pull up the range checks out of the loop and check them once rather than many times. 


 


So the reason the “non-caching” sample is faster is that the engine recognizes this pattern and not the pattern for the caching example.  Could we teach the JIT about the other example?  Sure, and maybe we will someday, but every new special case we do increases the JIT’ing time, so we have to find a balance.


 


The moral here is not “learn all the JIT-tricks” as those change from release to release.  The lesson here is more along the lines of measure, measure, measure when you are doing perf work — and don’t try to outsmart the system 😉


 


Please check out the managed code section of Improving .NET Application Performance and Scalability book which our local perf expert contributed heavily to. 


 


Hope that was helpful!


 

Comments (16)

  1. Drew Marsh says:

    Any reason you don’t just suggest foreach() in this kind of scenario? Since, after all, foreach() over an array ends up expanding to the pattern that is recognized by the JIT upon compilation. Obviously if you need the index ya gotta stick with the for(), but that’s usually the rarer case.

    Just curious,

    Drew

  2. Henry Boehlert says:

    Thanks, it was helpful indeed. But then again, I’m a little unhappy with that.

    It’s a area where developers can only guess what the right thing is. In your example, what I would have considered a safe bet was wrong and the opposite was true.

    In Ian Hoff’s example (on channel 9), a similar call (Array.GetUpperBound) was the cause of a 25% performance hit.

    Profiling each and every loop is just too much work. There has to be more consistent guidance to let us make safe 80/20 bets.

    Performance Considerations on MSDN is a very good summary but it is probably outdated and way to short on JIT.

    Improving Managed Code Performance says:

    Avoid repetitive field or property access.

    I’d like to call for enhanced FxCop rules on performance, so we could get an automated grip on the problem without having to second guess every little loop.

    I understand that the JIT compiler can only perform so much optimization, so wouldn’t it be actually better to leave out obscure features if they are impossible to be implemented consistently?

    Why can’t the C# compiler come to the rescue here? You could put const/readonly propagation and all the special knowledge into it.

    During (release) compilation to IL you have (almost) all the time in the world.

  3. Lee says:

    Has this been changed in Whidbey? I’ve made this flawed optimization before specifically after a "measure, measure, measure" step. It wasn’t flawed when I did it, as that loop construct became 39% faster. I just tried it again, and it continues to be faster.

  4. Alex Suzuki says:

    Brad: Thanks for the answer. That reader you mentioned was me. 😉

    Lee: I can confirm that. The for loop that caches the length is no longer slower in Whidbey. It seems to perform pretty much identically.

  5. ahnan says:

    I don’t think that’s convenient. Not only for developers but also other languages/compliers. It just confused me.

  6. Mike Swaim says:

    I know about that optimization, but what about ArrayLists? I use them a whole lot more than arrays.

  7. Brad Abrams explains a JIT optimisation which could catch you out.

    This kinda stuff is interesting,…

  8. This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere for quite some time and makes some general performance recommendations.

  9. This article discusses looping performance at the lowest level (native). It dispels a few myths that have been circulating the blogosphere and makes some general performance recommendations.

  10. jokiz's blog says:

    [code language=&quot;C#&quot;]for (int i = 0; i &amp;lt; collection.Count; i++){&amp;nbsp;&amp;nbsp;&amp;nbsp; //do something with…

  11. jokiz's blog says:

    I have posted an entry on

    my blog on how to have an efficient for loops by storing the length

    before…

  12. jokiz's blog says:

    [code language=&quot;C#&quot;]for (int i = 0; i collection.Count; i++){ //do something with collection}[/code]I’ve

  13. jokiz's blog says:

    I have posted an entry on my blog on how to have an efficient for loops by storing the length before