Performance Quiz #6 -- Looking at the second cut

Stefang jumped into the fray with his analysis in the comments from my last posting.  Thank you Stefang.  And it seems, as usual, things aren't always what they appear :)

Let's continue the story without me giving away the ending just yet though.  How is Raymond is doing? My last posting studied his first effort, that's the one that I ported for Quiz #6, and today we'll look at his second effort and see how he did.  Remember we were targetting the time in the getline function.

To get a rough idea if he's doing well I measured the raw execution time on my own machine, from process start to process exit.

Unmanaged Version 1:    Time of execution 1.328
Unmanaged Version 2:    Time of execution 0.828

So the new version is 1.6 times faster.  Nice!

And, in some detail, this what I see for his version #2.

Function Name   Exclusive Percent   Inclusive Percent
_main 0 97.695
_mainCRTStartup 0 97.695
Dictionary::Dictionary(void) 0.231 94.744
std::getline<char,struct std::char_traits... etc. 0 61.918
std::getline<char,struct std::char_traits... another etc 0.922 59.382
std::basic_filebuf<char,...>::underflow(void) 0.692 37.575
std::basic_filebuf<char,...> >::uflow(void) 2.259 34.947
_fgetc 3.043 32.826
std::_Fgetc(char &,struct _iobuf *) 0.461 32.688
__lock_file 1.614 21.254

And sure enough the time in getline has dropped to about 62%.   Of course that's 62% of a smaller number too so it's even better than it looks.

Interestingly, we're still paying what looks a hefty toll for using a multi-threaded version of the runtime... that's what we get by default.  But it would be interesting to see what you guys get if you statically link with a different runtime.

Function Name(sanitized)  ExclusivePercent  InclusivePercent
RtlLeaveCriticalSection 18.811 18.811
RtlEnterCriticalSection 18.119 18.119
GetCPHashNode 4.61 4.61
InterlockedIncrement 4.38 4.38
InterlockedDecrement 4.149 4.149
MultiByteToWideChar 3.965 8.852
__lock 3.827 19.64
_fgetc 3.043 32.826
__SEH_prolog 2.582 2.582
std::basic_filebuf<char,struct std::char_traits<char> >::uflow(void) 2.259 34.947

And I'm going to tip my hand now without doing the analysis, here's how the numbers stack up so far:

Version Execution Time (seconds)
Unmanaged v1 1.328
Unmanaged v2

0.828

Unoptimized Managed Port       0.124

Color me happy :)  I get numbers comparable to what Stefang reported.  But the story isn't over yet!  Raymond's still got plenty of ammunition.  Let's see what happens when he brings out the heavy artillery.  And we've yet to see what the profile looks like for the managed code.