Profiling FAQ #1: Why doesn't my Sampling Call Tree look like my Instrumentation Call Tree

This post is adapted from an internal mail.  The customers were somewhat confused about the reason their callstacks looked so different in Sampling mode and Instrumentation mode.  Let's say your program consists of only 2 DLLs, foo.dll and bar.dll. 

Foo.dll has two functions, Foo1 and Foo2.
Bar.dll has two functions, Bar1, and Bar2.

Imagine this call stack: Foo1 calls Bar1 calls Bar2 calls Foo2.

In sampling you would see: ( as expected ):

Foo1
Bar1
Bar2
Foo2

Now let's say you look at the sampling stats and determine that your actual problem is in foo.dll since all the exclusive samples are in Foo2.
So you instrument foo.dll (and NOT bar.dll) to drill down on it. The callstack would be:

The callstack would be:

Foo1
Bar1
Foo2

This is a little counter-intuitive. The reason is that we add probe points the entry point of each function AND any call site that is external to the dll

So in the body of Foo1 and foo2:

Void Foo1()
{
FUNC_ENTER(foo1);

// do some stuff

EXTERNAL_CALL_ENTER(foo1,bar1);
Bar1();
EXTERNAL_CALL_EXIT(foo1,bar1);

// do some more stuff

FUNC_EXIT(foo1);
}

Void Foo2()
{
FUNC_ENTER(foo2);
//stuff
FUNC_EXIT(foo2);
}

The sequence of probes we see leads to the truncated call stack above, which lacks knowledge of Bar2.