Well, once again my elite readers have made a solution posting by me nearly redundant. There were many good viewpoints in response to my article including some especially excellent comments that were spot on.
Now there are several things worth saying, but I think perhaps the most important one, in my opinion, was hit on directly by Ian Griffiths and James Curran.
Ian: “Presumably throwing an exception causes the CPU to fetch code and data it would otherwise not have executed…”
James: “I think what Jon’s article is overlooking is that his timing is done in isolation…”
And these are crucial observations that I talked about in a previous article (see the section called “Fear the Interference” at the end). In this particular case we’re doing nothing but throwing exceptions. So naturally all the code associated with the throw path is in the cache, all the data for resolving the locations of the catch blocks is in the cache, etc. etc. Basically what we have here is much closer to a measurement of the minimum an exception could possibly cost than a typical cost. Fair enough, minimum cost is useful too, but it is sort of an understatement by definition.
I’ve mentioned some of the additional costs but let me make a list that’s somewhat more complete.
- In a normal situation you can expect additional cache misses due to the thrown exception accessing resident data that is not normally in the cache
- In a normal situation you can expect additional page faults due to the thrown exception accessing non-resident code and data, not normally in your workingset
- noteably throwing the exception will require the CLR to find the location of the finally and catch blocks based on the current IP and the return IP of every frame until the exception is handled plus the filter block (if any, VB can have one)
- additional construction cost and name resolution in order to create the frames for diagnostic purposes, including reading of metadata etc.
- both of the above items typically access “cold” code and data hence hard page faults are probable if you have memory pressure at all
- we try to put code and data that is used infrequently far from data that is used frequently to improve locality, this works against you if you force the cold to be hot.
- the cost of the hard page faults, if any, will dwarf everything else
- Typical catch situations are significantly deeper than the test case therefore the above effects would tend to be magnified (increasing the likelihood of page faults)
- Finally blocks have to be run anyway so their cost isn’t really “exceptional” however whatever code is in the filter blocks (if present) and catch block is subject to the same issues as the above
Some people thought the ring transition cost was important, in my opinion it isn’t especially and it is at least somewhat accounted for in the test as written (except to the same extent that everything else is cheaper because the whole test fits comfortably in L2). I’m not sure there even is a ring transition for a thrown exception in fact, although some kernel32 code runs, I think everything that needs to be done can be done in user mode. But in any case that cost would be included.
Generally, microbenchmarks are designed to hide certain costs and magnify others. This is not a bad thing, the trick is to make sure you know what you’re getting.
- Is the metric the one of interest to you?
- Is the workload representative?
- Is the environment representative?
Just those three quick checks will get you far.