Basic Profiler Scenarios

This post was going to cover some basic scenarios discussing the differences between sampling and instrumentation and when you would choose to switch methods, but then I found there is already something like that in MSDN. If you haven't already, go and take a look. See if you can improve the performance of the PeopleTrax app.

Instead I'll discuss sampling and instrumentation from a user's perspective. There are already many definitions of sampling vs instrumentation so I won't repeat them.

For some background reading on the sampling aspect, take a look at David Gray's post. There are a few things that he hasn't covered in that post. The main question I had was should I use sampling or instrumentation?

A generic answer to that would be:

  • If you know your performance problem is CPU-related (i.e. you see the CPU is running at or near 100% in task manager) then you should probably start with sampling.
  • If you suspect your problem may be related to resource contention (e.g. locks, network, disk etc), instrumentation would be a better starting point.

Sometimes you may not be sure what type of performance issue you are facing or you may be trying to resolve several types of issues. Read on for more details.

Sampling

Why use sampling instead of instrumentation?

Sampling is lighter weight than instrumentation (see below for reasons why instrumentation is more resouce intensive) and you don't need to change your executable/binaries to use sampling.

What events do you sample with?

By default the profiler samples with clock cycles. This should be familiar to most users because they relate to the commonly quoted frequency of the machine. For example, 1 GHz is 1 billion clock cycles / second. If you use the default profiler setting for clock cycles that would mean 100 samples every second on a 1 GHz machine.

Alternatively, you could choose to sample using Page Faults, which might occur frequently if you are allocating/deallocating memory a lot. You could also choose to profile using system calls or some lower level counter.

How many samples is enough to accurately represent my program profile?

This is not a simple question to answer. By default we only sample every 10000000 clock cycles, which might seem like a long time between samples. In that time, your problematic code might block waiting on a lock or some other construct and the thread it is running in might be pre-empted allowing another thread to run. When the next sample is taken the other thread could still be running which means the problematic code is not included in the sample.

The risk of missing the key data is something that is inherent in any sample-based data collection. In statistics the approach is to minimize the risk of missing key information by making the number of samples large enough relative to the general population. For example, if you have a demographic that includes 10000 people, taking only 1 sample is unlikely to be representative. Taking a sample of 1000 people might be considered representative. There are more links about this on Wikipedia.

Won't this slow down my app?

No, not really. When a sample is taken the current thread is suspended (other application threads continue to run) so that the current call stack can be collected. When the stack walk is finished, execution returns to the application thread. Sampling should have a limited effect on most applications.

Sounds good, why use instrumentation?

See below.

Instrumentation

Why use instrumentation?

As discussed above, sampling doesn't always give you the whole picture. If you really want to know what is going on with a program the most complete way is to keep track of every single call to every function.

How does instrumentation work (briefly)?

Unlike sampling, with instrumentation the profiler changes the binary by inserting special pieces of code called probes at the start and end of each function. This process is called 'instrumenting the binary' and it works by taking a binary (dll or exe) along with its PDB and making a new 'instrumented binary'. By comparing a counter at the end of the function with the start, it is easy to determine how long a function took to execute.

What if I call other people's code?

Usually you don't have access to the PDB files for other people's code which means you can't instrument it. Fortunately as part of the instrumentation process the profiler inserts special probes around each call to an external function so that you can track these calls (although not any functions that they might call).

Why not just use Instrumentation all the time?

Computers execute a lot of instructions in 10000000 clock cycles, so using instrumentation can generate a LOT of data compared with sampling. The process of calling the probe functions in an application thread can also degrade performance more than sampling would.