The Performance War: Using counts to help navigate flat performance reports

A while ago I wrote about how you often win the performance war 5% at a time.  The theme of that article was essentially that if you are finding huge performance wins, especially from just the most basic performance reports, that's usually a sign that something went terribly wrong in the design phase.  When you're lucky you can recover from that kind of mistake by sending in the men with capes but other times you're in for a painful rewrite.

Well ok so I already covered that theme.  Why am I writing now?

Well, what about when the problem really is a no-kidding-around hard performance situation.  Maybe the design was done carefully.  Maybe some other smart people have come along and already done all of the obvious tuning steps.  So you run the profiler and there are no hot spots.  What now? 

I'm glad you asked!

I have a general process I use to attack performance problems.  It's really based on one of my favorite quotes, from Raymond Chen, who says (and this may not be exactly right but it's close) "To make your code faster you take out the slow parts." *

It may seem like just a trite Raymondish thing to say but actually it's sound advice, because often the secret to getting good performance is tracking down the parts of your code that don't really need to happen at all, and finding ways to remove them.

So here's a few steps that help me think about this kind of thing.  It's a very high level model but I thought it was worth writing it out:

#1 Size the input

Not every program can be cleanly thought about as an input/output transducer of some kind but many, if not most, can with the right metaphor.  So try to think of your program in that way.  What is it processing?  Characters? Lines? Nodes? Messages? Events?  How many of these things?  Quantify them in suitable units.

#2 Size the output

No surprise here, in many cases the processing cost is a function of both the input and the output so quantify the output as well.  What does it look like?  Characters? Lines? etc. just as above.

#3 Measure counts in your program

Use the best profiler you can to get every kind of count you can reasonably instrument.  Function call counts.  File reads.  Registry reads.  Network roundtrips.  Any sort of resource that is relevant to the job at hand.

#4 Restate the various counts in terms of input and output

Once you've done the counting, trends tend to appear.  "This function is called once per byte", "This function is called 3 times per byte", "There are two registry reads per line of input" etc.  Some of these will probably leap right off the page:  e.g. "Why do we call the process character function twice for every byte of input, that can't be right."  These are the kinds of things you're looking for.  But let's move on to...

#5 Characterize each of the costs as either "real forward progress" or "bookkeeping"

A lot of what we do in any given program is bookkeeping, some of which is ultimately necessary but is nonetheless bookkeeping.  We can't get rid of work that is fundamentally necessary to solve the problem but we can target the bookkeeping for simplification or eradication.

#6 Make a plan to reduce the per unit costs

The previous two steps will guide us to one of two places to look for savings

  • We're processing the input or output more than we need to (from #4)
  • We have bookkeeping costs we can eliminate (from #5)

Both of these are "slow parts" that we can "remove"

#7 Consider reducing the input or the output

Less is better in both cases. 

Can we solve the problem at hand without looking at all the input?  What if the input was reorganized in some way?   Perhaps many but not all cases can be addressed by looking at some initial part of the input.  Can we find common or repeated input patterns that we can predict or summarize?

What about the output?  Is that where the bulk of the cost (see #3 and #4) is?  Could we output something equally useful but more economical?  Is the format of the output well suited to describe the information it contains (e.g. did we use a few hundred bytes of XML to emit a couple of integers?)

#8 Test your plans and repeat your analysis

Don't forget that if you've found a chunk of processing that doesn't have to happen, once you've removed it you may now have a profile that has obvious hotspots again.  Go after those first and then repeat the waste analysis steps 1-7 above.

I hope you'll find that none of these steps are especially astounding, but really that is all there is to it.  I've followed these steps time and again to get some of my biggest victories.  It's good to have a little recipe when you're looking at profile results that don't appear to have a lot of low hanging fruit.

* Raymond saw this entry a little bit ago and he tells me he has no memory of ever even saying that although for my part I'm quite sure it was Raymond that first quoted it to me many (10ish) years ago.  A web search reveals that many others say the same thing. Who knows where Raymond first heard it.  Or I could just be wrong entirely about where I heard it, but I'll continue to give the truth as I best remember it.

Comments (3)

  1. ctodx says:

    A really fascinating piece by Rico Mariani on what to do when your program is still "too slow" but you’ve…

  2. Frank Hileman says:

    Excellent post Rico! Function call counts are the way to go, and if you can get a tree visualization that makes branches "thicker" according to counts, that makes it easier to focus on weird ones.

  3. Alois Kraus says:

    Hi Rico,

    interested in another performance challenge?

    I have opened a performance quiz about formatting the current time:


     Alois Kraus

Skip to main content