Performance Quiz #6 — Looking at the fourth cut


Raymond is definately making some great headway.  Having targetted the single-character at a time conversion problem in version 4 of his program he’s 1.84 times faster than the previous version.




















Version Execution Time
(seconds)
Unmanaged v1 1.328
Unmanaged v2 0.828
Unmanaged v3 0.343
Unmanaged v4 0.187

Unoptimized Managed port of v1      

0.124


As before I’m including the function costs in tree view, but only showing functions with an inclusive cost of at least 5% to keep the size of the report manageable.



















































































































Function Name (Sanitized)

 Exclusive
Percent
 Inclusive
Percent
  _mainCRTStartup 0.00 97.97
    _main 0.00 97.97
      Dictionary::Dictionary(void) 1.45 77.97
        MultiByteToWideChar 6.67 7.25
        std::vector<struct DictionaryEntry, …>::insert(…) 0.29 14.20
          std::vector<struct DictionaryEntry, …>::insert(…) 0.58 13.62
            std::vector<struct DictionaryEntry, …>::_Ucopy(…) 0.00 6.67
              std::_Construct<struct DictionaryEntry,struct DictionaryEntry>(…) 0.29 6.67
                DictionaryEntry::DictionaryEntry(struct DictionaryEntry const &) 0.29 6.38
                  std::basic_string<…>::basic_string<…>(class std::basic_string<…> const &) 2.03 6.09
        std::basic_string<…>::basic_string<…>(unsigned short const *,…) 0.29 7.83
          std::basic_string<…>::assign(unsigned short const *,unsigned int) 0.00 7.54
        DictionaryEntry::Parse(class std::basic_string<….> const &) 1.16 33.33
          std::basic_string<…>::assign(…) 0.29 28.70
            std::basic_string<…>::_Grow(unsigned int,bool) 0.58 26.09
              std::basic_string<…>::_Copy(unsigned int) 1.16 25.22
                operator new(unsigned int) 0.58 24.06
                  0.29 23.19
                    2.61 22.90
                      AllocateHeap 4.06 21.45
      std::vector<struct DictionaryEntry,…>::~vector<…>(void) 0.00 18.84
        std::vector<struct DictionaryEntry,class std::allocator<….>::_Destroy(…) 0.00 18.84
          DictionaryEntry::~DictionaryEntry(void) 0.29 18.26
            std::basic_string<…>::~basic_string<…>(void) 0.58 17.68
              std::basic_string<…>::_Tidy(bool) 1.16 17.10
                _free 0.29 15.65
                  FreeHeap 4.35 15.36


So looking at this new program it seems like its time to start targetting string management and the allocations.  We’re spending 24% of our time just allocating strings.  Now what’s interesting is that at this point things are fast enough that the tear-down at the end is getting to be signficant — at 18.84% it’s a good chunk of the cost.

Comments (10)

  1. michkap says:

    Hey Rico!

    I’m still aching to know how the Everett vs. Whidbey numbers look. 🙂

  2. ricom says:

    I’m saving the managed analysis for a bit… partly waiting to see if any of my readers have more to say based on their own measurements and partly waiting for Raymond’s unmanaged to finally beat the managed time.

    I think it will be interesting to see what it takes and what finally did the trick. Hang in there a few more days.

    I think the whole exercise is just full of surprises so I’m kinda hoping more people will say what’s on their mind.

  3. Dean Harding says:

    Hehe, you going to hold your trump card in reserve until the unmanaged one is as fast as it’ll get? 🙂

  4. I’ve been trying to figure if you and Raymond are doing this in concert – especially since you seem to be giving the hints to where Raymond’s going to be going next…

    And I’m still waiting for him to totally rip the CRT out of the picture.

    He’s pretty close right now. All he has to do is get rid of those darned string thingies (which are fundimentally pointless since he’s dealing with what are effectively fixed length items).

    And no, I didn’t peek at his next posts.

  5. Asd says:

    Just out of curiosity I did a direct port of the managed version to Java. I’m getting around 203ms.

  6. ricom says:

    I did see Raymond’s program quite some time ago, but I didn’t do any analysis then. Still it’s not hard to predict where he’s likely to go because he keeps picking off the biggest performance problem and it’s easy enough to spot that by looking at what he’s done so far. It’s not like I’m doing some exotic analysis here 🙂

  7. Will Dean says:

    The managed version takes around 50ms to run here, built with VS2005B2. (My PC is faster than I thought…)

    About 53% of the time (including children) is spent in Parse, about 23% in Substring in Parse.

    I don’t know if the VS profiler can be made to tell me more about what’s happening INSIDE something like Substring.

    My approach (as a C programmer, really) my be to load the whole string and just use parse to calculate the offsets and lengths of everything, thereby avoiding all the string copying.

    But of course, that would just be a trade-off of load time vs. access time, and I don’t know enough about the rest of the application to know if that would be a good thing.

  8. James Clarke says:

    Rico.. Do you have any working set numbers to compare between the two implementations?

  9. stefang says:

    Rico, do you have any comments to my comment at:

    http://blogs.msdn.com/ricom/archive/2005/05/12/416977.aspx#418471

  10. So I was reading through one of my favorite MSDN blogs (http://blogs.msdn.com/oldnewthing/)

    And he…