Performance Quiz #8 — The problems with parsing — Part 2

There were some great comments from the initial posting which I encourage you to read.  Many people latched on to the fact that GetToken(...) is doing a whole lot of allocations which is not something you generally want to see in a low level parsing function.  That was very high on my own list of things to think about when I wrote the code that motivated this problem.  Another excellent idea was to have GetToken() do the fact evaluation which simplifies its return code -- although this still leaves you with the problem of having to look up whether or not the fact is true and without some additional work you'd still have to allocate a string to do that.

But how much is all of this temporary string creation costing us anyway?  Is that the place to go for the big wins?

Let's keep the spirit of adventure going and look at some data; maybe you'll want to change your mind about what you recommend when you see these numbers.  I generated these views by using our very own profiler with only the public symbols, then I post-processed the .csv files that were created by vsperfreport for easy pasting.  This is all stuff you can do at home -- and unlike most of my previous postings I wasn't even using some weird internal build I happen to have on my desktop at the time, this was all on the v2.0 released bits.

OK here's that data.  First my favorite view -- the call tree starting from the program main and clipped to 2% inclusive time.

Exclusive% Inclusive% Function Name
---------- ---------- --------------------------------------------------------------------------------
0.88       98.47      ParserOriginal.Program.Main(string[])
1.41       97.15        ParserOriginal.Program.EvaluateOr()
2.87       93.96          ParserOriginal.Program.EvaluateAnd()
3.08       81.27            ParserOriginal.Program.EvaluateOptionalNot()
1.85       40.07            | ParserOriginal.Program.EvaluateFact()
12.12      18.89            | | System.Collections.Generic.Dictionary`2.FindEntry(!0)
0.70       3.31             | | | System.Collections.Generic.GenericEqualityComparer`1.Equals(!0,!0)
0.24       2.33             | | | | System.String.Equals(string)
2.08       2.08             | | | |   System.String.EqualsHelper(string,string)
0.55       2.87             | | | System.Collections.Generic.GenericEqualityComparer`1.GetHashCode(!0)
2.32       2.32             | | |   System.String.GetHashCode()
0.26       16.09            | | ParserOriginal.Program.EvaluateOr()
0.82       15.54            | | | ParserOriginal.Program.EvaluateAnd()
0.63       10.73            | | |   ParserOriginal.Program.EvaluateOptionalNot()
1.95       5.26             | | |   | ParserOriginal.Program.GetToken()
0.45       4.29             | | |   | ParserOriginal.Program.EvaluateFact()
2.36       3.51             | | |   |   System.Collections.Generic.Dictionary`2.FindEntry(!0)
0.78       3.11             | | |   ParserOriginal.Program.GetToken()
1.04       2.41             | | System.String.Equals(string,string)
9.93       32.16            | ParserOriginal.Program.GetToken()
2.41       14.78            | | System.String.InternalSubStringWithChecks(int32,int32,bool)
4.22       12.37            | | | System.String.InternalSubString(int32,int32,bool)
0.02       4.71             | | |   ExecuteCodeWithGuaranteedCleanupBackout(...)
0.00       4.67             | | |     SystemNative::ArrayCopy(...)
0.01       4.65             | | |       MethodDesc::GetStableHash(void)
0.01       4.62             | | |         WKS::GCHeap::Alloc(unsigned int,unsigned long)
0.00       4.61             | | |           WKS::gc_heap::allocate_more_space(...)
0.00       2.91             | | |             EEVirtualProtect(...)
2.90       2.90             | | |               WKS::gc_heap::try_allocate_more_space(...)
7.45       7.45             | | ParserOriginal.Program.getc()
1.21       4.97             | System.String.Equals(string,string)
3.76       3.76             |   System.String.EqualsHelper(string,string)
2.60       7.71             ParserOriginal.Program.GetToken()
0.64       3.93               System.String.InternalSubStringWithChecks(int32,int32,bool)
1.29       3.29                 System.String.InternalSubString(int32,int32,bool)

And then the top 25 functions sorted by exclusive time. 

Exclusive% Inclusive% Function Name
---------- ---------- ---------------------------------------------------------------------------------------------
16.04      49.01      ParserOriginal.Program.GetToken()
14.48      22.40      System.Collections.Generic.Dictionary`2.FindEntry(!0)
10.49      10.49      ParserOriginal.Program.getc()
10.36      10.36      System.String.EqualsHelper(string,string)
6.68       18.67      System.String.InternalSubString(int32,int32,bool)
4.69       4.83       CLRStubOrUnknownAddress
4.42       4.42       WKS::gc_heap::try_allocate_more_space(struct alloc_context *,unsigned int,int)
3.81       22.48      System.String.InternalSubStringWithChecks(int32,int32,bool)
3.71       81.27      ParserOriginal.Program.EvaluateOptionalNot()
3.69       93.96      ParserOriginal.Program.EvaluateAnd()
3.22       11.29      System.String.Equals(string,string)
2.91       2.91       System.String.GetHashCode()
2.83       2.83       struct JIT_Writeable_Thunks JIT_Writeable_Thunks_Buf
2.42       2.42       System.String.wstrcpy(char*,char*,int32)
2.29       40.07      ParserOriginal.Program.EvaluateFact()
1.67       97.15      ParserOriginal.Program.EvaluateOr()
1.32       1.32       ParserOriginal.Program.ReadToken()
0.88       98.47      ParserOriginal.Program.Main(string[])
0.83       3.66       System.Collections.Generic.GenericEqualityComparer`1.Equals(!0,!0)
0.61       3.52       System.Collections.Generic.GenericEqualityComparer`1.GetHashCode(!0)
0.24       2.53       System.String.Equals(string)
0.12       0.12       PromoteObject(class Object * *,long *,long,long)
0.09       0.09       WKS::gc_heap::make_free_list_in_brick(unsigned char *,struct WKS::gc_heap::make_free_args *)
0.07       0.07       WKS::logcount(unsigned int)
0.07       0.07       __EH_prolog3

So with this profile in mind what kinds of gains might we expect to see from getting rid of those substrings?  Can we take a stab at an upper bound on the cost of garbage collection in this application?  I don't know about you but when I saw this data it made me rethink things quite a bit.  What do you think?

See the continuation in Performance Quiz #8 -- The problems with parsing -- Part 3

Comments (13)

  1. don’t have 2.0 here to test, but is there something other than Dictionary that you could use?

    this is killing it

    return dictFacts.ContainsKey(fact);

    and on the GetToken function, would it help if the "ch" variable was global? so you wouldn’t allocate it everytime

  2. ricom says:

    Note: I change from using <PRE> </PRE> to just a fixed pitch font. The <PRE> formatting was bizarre in some window sizes and it looked like half the text was missing. You will want to have a smaller font setting and/or a wider window to see tables better.

    Now isn’t it surprising just how much is time spent evaluating if facts are true?

    I didn’t measure this but I suspect changing "char ch" to a global in GetToken() would make things worse. Globals cost more to access than locals.

    If you don’t have the 2.0 product you can still play along by installing Visual C# Express! It’s FREE!

  3. Jon Young says:

    I tried a Dictionary of SubString structs.

    I got about a 15% increase, and totally eliminated the string allocations (Assuming you use "char GetToken()" instead of "string GetToken()". )

    Of course this gain might evaporate if a industrial strength GetHashCode is needed in a real world example. Still, all the stress in the garbage collector is eliminated.

    public struct SubString {

    string str;

    int start;

    int length;

    public SubString( string str ) {

    this.str = str;

    this.start = 0;

    this.length = str.Length;


    public SubString( string str, int start, int length ) {

    this.str = str;

    this.start = start;

    this.length = length;


    public class EqualityComparer :IEqualityComparer<SubString> {

    public bool Equals( SubString x, SubString y ) {

    if( x.length != y.length )

    return false;

    for( int i = x.start; i < x.start + x.length; ++i ) {

    if( x.str[ i ] != y.str[ i ] )

    return false;


    return true;


    public int GetHashCode( SubString str ) {

    int hashCode = 0;

    for( int i = str.start; i < str.start + str.length; ++i )

    hashCode = hashCode ^ str.str[ i ];

    return hashCode;




    //the new Dictionary

    private static Dictionary<SubString, object> dictFacts = new Dictionary<SubString, object>( new SubString.EqualityComparer() );

  4. ricom says:

    hmmm I don’t think this code is quite right:

    for( int i = x.start; i < x.start + x.length; ++i ) {

    if( x.str[ i ] != y.str[ i ] )

    return false;


    I think you need something closer to this:

    int i = x.start;

    int j = y.start;

    for( int l = 0; l < x.length; l++ ) {

    if( x.str[ i+l ] != y.str[ j+l ] )

    return false;


    return true;

  5. Jon Young says:

    "I don’t think this code is quite right"

    Go ahead and say it. It was wrong.

    So with the above "improvements", this allocation free program is about a 38% faster.

  6. ricom says:

    It was mostly right 🙂

  7. Jon’s "38% faster" is just short of the 22.40% from Dictionary.FindEntry(), plus 18.67% from InternalSubString().

    I’m having a hard time visualizing how you’ll use your parser in your real program, in particular how the hash of facts->booleans will interact with the "meat" of the program.

    Does the overall program look like this:

    hash = empty

    while (!done) {

    change_the_facts_in (hash)



    I wonder what Rico means by "time spent evaluating if facts are true" — by that, do you mean parsing/evaluating a whole predicate, or just doing the hash lookup?

    If the predicates don’t change, *but* the facts change between each run, I’d look into precomputing a parse tree for the predicates. That way you can do short-circuiting easily. If you get really fancy, you could see which facts *did* change, and mark those subtrees as dirty – just re-evaluate those, instead of walking the whole tree again.

  8. I’m intrigued as to the purpose of the overall program.

    A wild guess is that you have these:

    – various versions of the same source code

    – "pass/fail" unit tests. Each fact is one such test.

    – A bunch of predicates that combine the results of the various tests to decide whether a larger subsystem works or not.

    … and if the facts change before each run through all the predicates, it would mean that you are trying to look back through the various versions of the code, trying to automatically find the place where "mostly everything worked".

    Or something like that. Actually, having such a tool would be pretty cool 🙂

  9. ricom says:

    The main loop that Fredrico outlined above is roughly correct. The facts change over time and the predicates need to then be re-evaluated. This kind of thing actually comes up a lot. Suppose you have a server like Windows update and you need to know what updates to deliver to a particular user. They come along and tell you stuff about their current configuration and you have to decide what updates to enable…

    He also writes:

    >>I wonder what Rico means by "time spent evaluating if facts are true" — by that, do you mean parsing/evaluating a whole predicate, or just doing the hash lookup?

    I meant just the latter part. The hash lookups cost a surprising amount.

  10. Hmmm. So if the "fact" values come from yes/no answers about the user’s configuration (or boolean values derived from the questions you ask)…

    How many "facts" do you have? You seem to be using the dictionary to represent a set. Could you use a bit array instead?

    Do the real predicates look like

    (fact1 | fact2) & !fact3

    or rather like

    (has_patch_foo | is_older_than_bar) & !some_other_thing

    I.e. are the names of the facts kind of numeric, or are they generic symbols? Can you use a perfect hash if all the possible symbols are known in advance?

    If the predicates don’t change, and you preprocess them into a parse tree or whatever, you can then store the information about how to evaluate each fact quickly. For example, if your facts can indeed be stored as a bit array, then the operand nodes in the parse tree can simply hold an index into the bit array, rather than the corresponding fact name.

  11. ricom says:

    Federico Mena-Quintero Writes:

    >>Do the real predicates look like […]

    (has_patch_foo | is_older_than_bar) & !some_other_thing

    More like that choice, so a special purpose hash isn’t so good.

    >>Could you use a bit array instead?

    I like that direction 🙂

    More details on my next work day 🙂

  12. I ported to parser to F#, and now it runs faster :), take look at the results:

  13. ricom says:

    Hey that looked pretty fun! Nice effort!

Skip to main content