Performance Quiz #6 — Analyzing the managed code


Raymond didn’t post an update today so I think this is a good time for me to post the first analysis of the managed code. There’s some things to notice right away and I’ve highlighted them in the table below. This table was generated in the same way as the others and again I’ve pruned the tree so that no functions with inclusive time less than 5% are shown, so that means some things won’t add up because some small things are gone — still it helps focus attention on the relevant bits and keeps this table reasonably sized. I’ve also removed functions which were basically just zero or near zero cost wrappers, so there are some internal “…” entries. That’s what those are.













































































































Name

 Exclusive
 Percent

 Inclusive
Percent
  __CorExeMain (from mscoree.dll) 0.00 92.31
    __CorExeMain (from mscorwks.dll) 0.00 88.46
      ExecuteEXE(struct HINSTANCE__ *) 0.00 69.23
        SystemDomain::ExecuteMainMethod(…) 0.00 69.23
          Assembly::ExecuteMainMethod(class PtrArray * *) 0.00 59.62
            … 0.00 59.62
            NS.Test.Main(string[]) 0.00 59.62
              NS.Test.Dictionary..ctor() 1.92 58.65
                System.IO.StreamReader.ReadLine() 3.85 28.85
                  System.IO.StreamReader.ReadBuffer() 0.00 16.35
                    System.Text.DecoderNLS.GetChars(…) 0.00 15.39
                      … 0.00 15.39
                        … 0.00 15.39
                          System.Text.DBCSCodePageEncoding.GetChars(…) 15.39 15.39
                  System.String.CtorCharArrayStartLength(char[],int32,int32) 0.00 5.77
                NS.Test.DictionaryEntry.Parse(string) 2.89 26.92
                  System.String.Substring(int32,int32) 0.96 14.42
                    System.String.InternalSubString(int32,int32) 0.00 13.46
                      … 0.00 7.69
                        … 0.00 6.73
                          Alloc(unsigned long,int,int) 6.73 6.73
                  String::IndexOfChar(…) 1.92 5.77
      EnsureEEStarted(…) 0.00 18.27
        … 0.00 18.27
          … 0.00 18.27
            InitializeProfiling 0.00 6.73

First thing to notice is that the time attributed to the overall “main” of the program (that would be __CorExeMain) is only 92%. Why isn’t that 100%? Well there’s a few reasons. The first is that the CLR has more than one thread so some of the time went to spinning up the finalizer thread for instance. A second reason is that some percentage of the run will be kernal time and we won’t be able to attribute those stacks to anything so those are gone. And the third, and most subtle reason, is that some places during the run of a managed program have a stack frame that is so crazy (usually because we’re in the middle of an exotic transition) that the profiler can’t walk it. That causes it to drop some samples. The result of these factors is that main is usually close to but not 100% of the time. In this case 8% of the samples were elsewhere or not traceable.


But now lets get to some meatier conclusions: NS.Test.Main() which is the managed main function is only getting 59% of the time. Even allowing for the the fact that main was only 92.31% of the time, the managed main is only 64.6% of that. Why is that?


Well there were some assorted small things that were in the set of things pruned from this view (at less than 5%) but they added up… some of it was checking the security configuration only to discover that we are full trust, some of it was looking for possible administrative configurations, some of it was setting up the performance counters for this instance of the application. 


Another chunk of the time still visible even in this pruned view — EnsureEEStarted() for instance at 18.27% of the startup including 6.73% to get the profiling services working. That part of the cost would naturally not even be present if the program wasn’t run under the profiler.  All these startup tasks tend to require other DLLs, typically already loaded but we still have to get access to them in this process. It all adds up.


Am I happy with a number like 59.62%? Well no, it’s my job to make that number bigger (i.e. reduce the overhead) but this is a fairly small application… the whole run will be just over 100ms so basically the (fairly) fixed cost of starting up the runtime is significant in this case. It would be less daunting if the program did more.


Note to reviewers of managed systems: If you want to make a framework — any framework — look bad, simply ask it to do nothing. It turns out pretty much all known frameworks are lousy at doing nothing. So “hello world” for instance is a great way to make a framework look like a pig.


Still there’s room for improvement there which is why I have a job 🙂


Moving on, System.IO.StreamReader.ReadLine() is at 28% which may not look too high, after all this program is primarily a parser. However, looking closer we can see that the hefty work, decoding code page 950, is in DBCSCodePageEncoding.GetChars() and that’s only at 15.39%. It’s not unhealthy to see a number like 15% spent doing one of the nice meaty jobs at hand but how did we go from 15% to decode the text all the way to 28% to decode the text and break it into lines? That seems excessive… maybe we could do something about that.


Looking further down NS.Test.DictionaryEntry.Parse() is up at 26% compared to the 28% for System.IO.StreamReader.ReadLine(). Now what’s up there… it’s just some simple scanning and string allocation. The string allocation should be the main cost yet it’s only 14.42% of the time in Substring of which a mere 6.73%. Plus the IndexOfChar worker, which should be doing most of the searching, is only at 5.77%. Well there is a good bit of control flow and function call setup in the Parse function. That’s probably a good chunkof it right there.


Still, the overall picture is pretty good. No exotic critical sections, no weird interlocked operations, very little time in the allocator. You can see why we posted a great out-of-the-gate number like 124ms. Even with the CLR’s fixed overhead we got great value out of the allocator and the input mapping services.


But we can make it better still 🙂

Comments (14)

  1. Will Dean says:

    Is is possible to get the VS2005B2 profiler to give that much detail?

  2. ricom says:

    That’s what I used. Actually I used a pre-beta2 build. To paste them here I did

    vsperfreport /summary:all dictread.vcp

    Then I loaded the calltree into excel and made a little formula that changes the level into indenting =CONCATENATE(REPT(" ",A2),B2) then I used auto-filter to trim out anything below 5% and after that I edited the function names to make them fit.

    The build I have is actually a pre-beta2 build so it’s buggier but I’m to lazy to upgrade. Err. What I mean is urgent customer issues have prevented me from upgrading. Ya, that’s what I meant 🙂

  3. Will Dean says:

    Thanks for that – it inspired me to try sampling mode rather than instrumentation, which did indeed discover all the framework calls.

    Unfortunately for those of us outside in the cold real world, MS doesn’t make EVEN THE PUBLIC SYMBOLS for things like MSCORWKS and MSCOREE available on symserv, so the output is largely a useless load of junk – NS.Test.Main is *11* levels deep in the tree, with EVERY level above listed as either just "[mscoree.dll]" or "[mscorwks.dll]".

    It’s a merciful relief when we actually get to the Win32 calls at the bottom of the framework and some meaningful names appear again.

    I’m sure that you have no route to influence SymServ policy but as a perf person, do you have any guidance as to how mere mortals should cope without any symbols?

  4. ricom says:

    I’m not without influence. Maybe there’s something I can do there.

    Even if you had the symbols, I’m not sure it would help diagnosis very often. I guess there are times when the CLR has lost its mind and seeing some clue as to how the mind was lost might be useful. If there are specific scenarios you can think of where those symbols would have saved your bacon that might help.

    Still… given that we give away Rotor you’d think publics would be a no-brainer.

    I’ll see what I can do. No promises tho.

    Thanks for reading 🙂

  5. Will,

    If you have installed .Net Framework SDK, symbols for mscoree/mscorwks are in SDK directory.

    But I agree, it should be in symbol server.

  6. Will Dean says:

    Thanks for enquiring – I’ve just filed it as a bug against B2 as well, to up the ante a little.

    Maybe you’re right about me overestimating the usefulness of the missing symbols, but without knowing what lurks behind the sea of [mscorwks.dll]’s, it’s hard to be sure that it’s not something useful.

  7. scarroll says:

    Ask and you shall receive. I just talked to the appropriate people about this (as it was screwing up a lot of our profiler customers).

    The release is imminent. Some more here:

    http://blogs.msdn.com/scarroll/archive/2005/05/17/418926.aspx

    Steve Carroll (profiler dev)

  8. ricom says:

    There you have it 🙂

    Thank you Steve and Junfeng!

  9. ricom says:

    >>without knowing what lurks behind the sea of [mscorwks.dll]’s, it’s hard to be sure that it’s not something useful

    You’re right about that. At the very least you could look and say "ah, nothing to worry about there, lets move along to the meaty portions"

  10. Regarding start-up costs of the CLR: In my own profiling I have noticed there is a considerable amount of xml parsing and string parsing, "doing nothing". Perhaps it is not always completely nothing — any call to Enum.IsDefined (cleverly hidden in many System.Drawing members) to racks up a big tree of calls. But why is the security system based on xml? Even without a config file, lots of parsing is going on. Are the start-up costs being reduced in the future?

  11. Will Dean says:

    Thanks guys for looking into this – I look forward to seeing them up later. (They’re not there yet…)

    junfeng: With B2, the symbols in the SDK symbol directory do not match the binaries. (Not even in file date). I’ve bugged this in Ladybug, and noticed that someone else complained at B1 of the same thing. Is there a build problem?

  12. Well, I’ve been slacking off too long… sitting on my nice 124ms time and doing nothing.  It’s…

  13. So I was reading through one of my favorite MSDN blogs (http://blogs.msdn.com/oldnewthing/)

    And he…

  14. Well, I’ve been slacking off too long… sitting on my nice 124ms time and doing nothing. It’s time I