How not to benchmark different languages

I've recently been trolling the web for any sort of language-comparision benchmarks, to see how the CLR's JIT stacks up to the competition.  Dr. Dobbs has what seemed to be a pretty reasonable micro-benchmark article.  It's not particularly insightful, but hey, it is hard to come up with stuff that makes sense to compare across multiple disparate languages.  I started looking for specific C# to Java comparisons and came upon this little gem.  I'll give you a few minutes to read it.

Now that you've read it, I would like to challenge you to understand how anyone would ever believe the question of "How long does it take to do nothing 1 billion times" in any language is worth comparing.  Seriously, ignore the thoughts regarding whether the comparison itself was fair or not.  The best part of that article is that the Java JIT they seem to be pushing doesn't get rid of the empty loops in a newer version.  Then look at their 'testing methodology' (or the lack thereof).  Not only are they not actually benchmarking anything that anyone with a brain in their head would care about, but they didn't even bother to figure out how use the tools that they claim to be comparing against.  Does anyone that actually uses C++* think that if they care about performance they're not going to enable the optimizer?  Every reasonable C++ compiler on the planet will completely eliminate the loops if only told to optimize.  And then the comparison against Assembly stuck in my craw.  Assembly is there because you KNOW WHAT YOU'RE DOING!!!!  If an assembly programmer writes a loop that iterates for a billion times, it's probably because they're trying to measure some particular pattern of code on the CPU they're running.

So (still snickering at the previous stuff), I guess I'm asking for input:  Does anyone have any non-micro-benchmark relatively objective managed code comparison links they'd recommend?  I'd really like to see how the CLR's JIT stacks up against some of the JVM's out there, for more than just little tiny toy code samples.

I'm also interested in 'end-to-end' timing, which includes JIT time.  Microbenchmarks with internal clocks don't tell you how long something actually ran for.  If a JIT takes 5 minutes to figure out it can remove 2 dead loops, your runtime isn't less than 1 second - it's less than 5 minutes and 1 second.

-Kev

* I found 4 different syntax errors in the C++ source code they provided...