Performance Quiz #6 -- Looking at the second cut
Stefang jumped into the fray with his analysis in the comments from my last posting. Thank you Stefang. And it seems, as usual, things aren't always what they appear :)
Let's continue the story without me giving away the ending just yet though. How is Raymond is doing? My last posting studied his first effort, that's the one that I ported for Quiz #6, and today we'll look at his second effort and see how he did. Remember we were targetting the time in the getline function.
To get a rough idea if he's doing well I measured the raw execution time on my own machine, from process start to process exit.
Unmanaged Version 1: Time of execution 1.328
Unmanaged Version 2: Time of execution 0.828
So the new version is 1.6 times faster. Nice!
And, in some detail, this what I see for his version #2.
Function Name | Exclusive Percent | Inclusive Percent |
_main | 0 | 97.695 |
_mainCRTStartup | 0 | 97.695 |
Dictionary::Dictionary(void) | 0.231 | 94.744 |
std::getline<char,struct std::char_traits... etc. | 0 | 61.918 |
std::getline<char,struct std::char_traits... another etc | 0.922 | 59.382 |
std::basic_filebuf<char,...>::underflow(void) | 0.692 | 37.575 |
std::basic_filebuf<char,...> >::uflow(void) | 2.259 | 34.947 |
_fgetc | 3.043 | 32.826 |
std::_Fgetc(char &,struct _iobuf *) | 0.461 | 32.688 |
__lock_file | 1.614 | 21.254 |
And sure enough the time in getline has dropped to about 62%. Of course that's 62% of a smaller number too so it's even better than it looks.
Interestingly, we're still paying what looks a hefty toll for using a multi-threaded version of the runtime... that's what we get by default. But it would be interesting to see what you guys get if you statically link with a different runtime.
Function Name(sanitized) | ExclusivePercent | InclusivePercent |
RtlLeaveCriticalSection | 18.811 | 18.811 |
RtlEnterCriticalSection | 18.119 | 18.119 |
GetCPHashNode | 4.61 | 4.61 |
InterlockedIncrement | 4.38 | 4.38 |
InterlockedDecrement | 4.149 | 4.149 |
MultiByteToWideChar | 3.965 | 8.852 |
__lock | 3.827 | 19.64 |
_fgetc | 3.043 | 32.826 |
__SEH_prolog | 2.582 | 2.582 |
std::basic_filebuf<char,struct std::char_traits<char> >::uflow(void) | 2.259 | 34.947 |
And I'm going to tip my hand now without doing the analysis, here's how the numbers stack up so far:
Version | Execution Time (seconds) |
Unmanaged v1 | 1.328 |
Unmanaged v2 | 0.828 |
Unoptimized Managed Port | 0.124 |
Color me happy :) I get numbers comparable to what Stefang reported. But the story isn't over yet! Raymond's still got plenty of ammunition. Let's see what happens when he brings out the heavy artillery. And we've yet to see what the profile looks like for the managed code.