Performance Quiz #8 -- The problems with parsing -- Part 2

Article
11/23/2005

There were some great comments from the initial posting which I encourage you to read. Many people latched on to the fact that GetToken(...) is doing a whole lot of allocations which is not something you generally want to see in a low level parsing function. That was very high on my own list of things to think about when I wrote the code that motivated this problem. Another excellent idea was to have GetToken() do the fact evaluation which simplifies its return code -- although this still leaves you with the problem of having to look up whether or not the fact is true and without some additional work you'd still have to allocate a string to do that.

But how much is all of this temporary string creation costing us anyway? Is that the place to go for the big wins?

Let's keep the spirit of adventure going and look at some data; maybe you'll want to change your mind about what you recommend when you see these numbers. I generated these views by using our very own profiler with only the public symbols, then I post-processed the .csv files that were created by vsperfreport for easy pasting. This is all stuff you can do at home -- and unlike most of my previous postings I wasn't even using some weird internal build I happen to have on my desktop at the time, this was all on the v2.0 released bits.

OK here's that data. First my favorite view -- the call tree starting from the program main and clipped to 2% inclusive time.

Exclusive% Inclusive% Function Name
---------- ---------- --------------------------------------------------------------------------------
0.88 98.47 ParserOriginal.Program.Main(string[])
1.41 97.15 ParserOriginal.Program.EvaluateOr()
2.87 93.96 ParserOriginal.Program.EvaluateAnd()
3.08 81.27 ParserOriginal.Program.EvaluateOptionalNot()
1.85 40.07 | ParserOriginal.Program.EvaluateFact()
12.12 18.89 | | System.Collections.Generic.Dictionary`2.FindEntry(!0)
0.70 3.31 | | | System.Collections.Generic.GenericEqualityComparer`1.Equals(!0,!0)
0.24 2.33 | | | | System.String.Equals(string)
2.08 2.08 | | | | System.String.EqualsHelper(string,string)
0.55 2.87 | | | System.Collections.Generic.GenericEqualityComparer`1.GetHashCode(!0)
2.32 2.32 | | | System.String.GetHashCode()
0.26 16.09 | | ParserOriginal.Program.EvaluateOr()
0.82 15.54 | | | ParserOriginal.Program.EvaluateAnd()
0.63 10.73 | | | ParserOriginal.Program.EvaluateOptionalNot()
1.95 5.26 | | | | ParserOriginal.Program.GetToken()
0.45 4.29 | | | | ParserOriginal.Program.EvaluateFact()
2.36 3.51 | | | | System.Collections.Generic.Dictionary`2.FindEntry(!0)
0.78 3.11 | | | ParserOriginal.Program.GetToken()
1.04 2.41 | | System.String.Equals(string,string)
9.93 32.16 | ParserOriginal.Program.GetToken()
2.41 14.78 | | System.String.InternalSubStringWithChecks(int32,int32,bool)
4.22 12.37 | | | System.String.InternalSubString(int32,int32,bool)
0.02 4.71 | | | ExecuteCodeWithGuaranteedCleanupBackout(...)
0.00 4.67 | | | SystemNative::ArrayCopy(...)
0.01 4.65 | | | MethodDesc::GetStableHash(void)
0.01 4.62 | | | WKS::GCHeap::Alloc(unsigned int,unsigned long)
0.00 4.61 | | | WKS::gc_heap::allocate_more_space(...)
0.00 2.91 | | | EEVirtualProtect(...)
2.90 2.90 | | | WKS::gc_heap::try_allocate_more_space(...)
7.45 7.45 | | ParserOriginal.Program.getc()
1.21 4.97 | System.String.Equals(string,string)
3.76 3.76 | System.String.EqualsHelper(string,string)
2.60 7.71 ParserOriginal.Program.GetToken()
0.64 3.93 System.String.InternalSubStringWithChecks(int32,int32,bool)
1.29 3.29 System.String.InternalSubString(int32,int32,bool)

And then the top 25 functions sorted by exclusive time.

Exclusive% Inclusive% Function Name
---------- ---------- ---------------------------------------------------------------------------------------------
16.04 49.01 ParserOriginal.Program.GetToken()
14.48 22.40 System.Collections.Generic.Dictionary`2.FindEntry(!0)
10.49 10.49 ParserOriginal.Program.getc()
10.36 10.36 System.String.EqualsHelper(string,string)
6.68 18.67 System.String.InternalSubString(int32,int32,bool)
4.69 4.83 CLRStubOrUnknownAddress
4.42 4.42 WKS::gc_heap::try_allocate_more_space(struct alloc_context *,unsigned int,int)
3.81 22.48 System.String.InternalSubStringWithChecks(int32,int32,bool)
3.71 81.27 ParserOriginal.Program.EvaluateOptionalNot()
3.69 93.96 ParserOriginal.Program.EvaluateAnd()
3.22 11.29 System.String.Equals(string,string)
2.91 2.91 System.String.GetHashCode()
2.83 2.83 struct JIT_Writeable_Thunks JIT_Writeable_Thunks_Buf
2.42 2.42 System.String.wstrcpy(char*,char*,int32)
2.29 40.07 ParserOriginal.Program.EvaluateFact()
1.67 97.15 ParserOriginal.Program.EvaluateOr()
1.32 1.32 ParserOriginal.Program.ReadToken()
0.88 98.47 ParserOriginal.Program.Main(string[])
0.83 3.66 System.Collections.Generic.GenericEqualityComparer`1.Equals(!0,!0)
0.61 3.52 System.Collections.Generic.GenericEqualityComparer`1.GetHashCode(!0)
0.24 2.53 System.String.Equals(string)
0.12 0.12 PromoteObject(class Object * *,long *,long,long)
0.09 0.09 WKS::gc_heap::make_free_list_in_brick(unsigned char *,struct WKS::gc_heap::make_free_args *)
0.07 0.07 WKS::logcount(unsigned int)
0.07 0.07 __EH_prolog3

So with this profile in mind what kinds of gains might we expect to see from getting rid of those substrings? Can we take a stab at an upper bound on the cost of garbage collection in this application? I don't know about you but when I saw this data it made me rethink things quite a bit. What do you think?

See the continuation in Performance Quiz #8 -- The problems with parsing -- Part 3

Performance Quiz #8 -- The problems with parsing -- Part 2

Additional resources