How many samples are enough when using a sample based profiler in a performance Investigation?

Performance analysis is my job, and so I answer a lot of questions about perf, and this blog is about a frequently asked question I get. 

When doing CPU analysis, many profilers, including the PerfView tool are sampling profilers.   In particular, for PerfView, by default every millisecond it will stop each processor and take a stack trace.   Thus you don't see every method that gets executed, but only those that happen to be on the stack when these 1 msec samples are taken.    PerfView allows you to control this (there is a CPU Sample Interval MSec text box in the 'advanced' area of the collection dialog box).  Using this textbox you an set it as fast as once ever 1/8 of a msec and as slow as once every 100 msec or more.   This leads to the question: How many samples are enough?    This is the subject of this blog entry. 

First  the obvious: From an overhead perspective slower is better .125 sampling is 8X more expensive! both in terms of runtime as file size (EVERY manipulation of the trace is not 8X slower (e.g. file transfers, Perfview view updates ...)).    Thus assuming you get the quality of information you need, slower is better.

So: the question is what rate will get you the quality you need.    It is important to realize that the answer to this question is NOT about the RATE, but whether you have enough TOTAL sample for you SCENARIO OF INTEREST.    There is information in the PerfView documentation on this very point (see How many samples do you need?) which you can quickly find by following the 'Understanding Performance Data' link at the top of the CPU view.   What that section says is that the potential error of a measurement varies as the square root of the number of samples.   Thus if you have a 100 samples, you have reasonable confidence that the 'true' number is between 90 and 110 (10% error).   If you had 10K samples (100 times more), you would have reasonable confidence that the number is between 99900 and 10100 (a 1% error).    Now while you might think you should drive the error as small as possible, really you don't need that.   Typically you don't care about 1% regressions, you care about 10% regressions.   If your scenario had 1K samples, a 1% regression is 100 samples and the error of that 100 sample measurement is 10 (sqrt(100)) or 10% (that is you are confident that the true regression is between 90 and 110 msec).   As you can see that is typically 'good enough'   This leads to the rule of thumb that you need between 1K and 10K samples OVER YOUR SCENARIO to get an error that is good enough for most investigations. 

It is important to realize that this has NOTHING to do with how small any particular function is.    The basic intuition is that if a function is small, unless it is called a lot you don't care about it.  If it is called a lot, it WILL show up in the samples if you have enough samples in your SCENARIO OF INTEREST.   Thus the driving factors are

  1. What the period of time is for the scenario of interest is.    If this is a 'one shot' scenario (only happens once), then you need a rate that will capture between 1 and 10K samples in that time period.   Thus the MOST COMMON reason for needing a high sample rate is that you have a short scenario (100msec) that only happens once.  (thus you would LIKE .1msec sampling to get to 1000K total samples).   But typically you are measuring AVERAGES over LONG time (e.g. you measure the average over 10 seconds of requests), then you can easily get your 10K samples using 1msec time.   In fact whenever you have a RECURRRING scenario it is EASIER AND BETTER, to simply measure for a longer period of time rather than increase the sampling rate.    (In fact, when you have long traces, you can make perfView more responsive without loosing the accuracy you need by limiting the time interval you look at to say 10 or 20 seconds of trace.
  2. What error rate you can tolerate.   If you are looking for 'big' things (e.g. 10% of the total trace), then 1K samples is enough.   10K samples allows you to see 1% regressions (100 samples), with 10% error (which is fine).  If you were looking for .1% regression you would need 100K samples to be able to find .1% regressions with 10% error.     

The result of all of this is

  1. What matters is the total number of samples in your scenario of interest.
  2. If you have a RECURRING scenario, you should simply collect for a longer period of time. 
  3. If you have a ONE SHOT scenario and that scenarios is short (e.g. 100Msec) you need to increase the sample rate AND/OR run the scenario multiple times (making it a RECURRING scenario).