Well, it’s time for me to surrender. Sort of
Raymond pulls out all the stops in his sixth version by painting a big bullseye on his biggest remaining source of slowness which is operator new. He turns in an excellent result here. On my benchmark machine I see the number drop from 124ms to 62ms — a full 2x faster from start to finish. And observing the footnote on my previous message, the runtime for his application is now comparable to the CLR’s startup overhead… I can’t beat this time.
Let’s look at the results table now to see how we ended up:
|Unmanaged v5 With Bug||0.296|
|Unmanaged v5 Corrected||0.124|
|Unoptimized Managed port of v1||0.124|
|Optimized Managed port of v1||0.093|
Six versions and quite a bit of work later, we’ve been soundly trumped. But before I discuss that, let me put up the internal profile of Raymond’s version 6
I’ve applied my usual filters to the call tree (nothing lower than 5% inclusive) and I also pruned out a couple of functions below HeapAlloc because they have long names and are boring
|Function Name (Sanitized)|| Exclusive|
|operator new(unsigned int)||1.087||16.304|
You can see that the memory allocation time is way down as a percentage, and of course that’s a smaller percentage of a smaller total time. I think he gets a lot of raw speed from his improved locality thanks to that new allocator as well. Interestingly SEH overhead is up to a signifcant level in this run (now over 5% for the first time). Still nothing to be worried about.
So am I ashamed by my crushing defeat? Hardly. The managed code got a very good result for hardly any effort. To defeat the managed Raymond had to:
- Write his own file/io stuff
- Write his own string class
- Write his own allocator
- Write his own international mapping
Of course he used available lower level libraries to do this, but that’s still a lot of work. Can you call what’s left an STL program? I don’t think so, I think he kept the std::vector class which ultimately was never a problem and he kept the find function. Pretty much everything else is gone.
So, yup, you can definately beat the CLR. Raymond can make his program go even faster I think.
Interestingly, the time to parse the file as reported by both programs internal timers is about the same — 30ms for each. The difference is in the overhead.
Tomorrow I’m going to talk about the space used by these programs and that will wrap it up. Though I think Raymond is going to go on and do some actual UI and so forth with this series. That should be fun to watch.