DoStackSnapshot Tidbit #2: Looney HRESULTs

Generally, corerror.h tells you all you need to know about what kinds of HRESULTs to expect back from DoStackSnapshot.  However, there are some fringe cases where you can get back an HRESULT that’s not as descriptive as you might like.


I don’t much like E_FAIL.  If DoStackSnapshot fails, you will typically see a more descriptive, custom HRESULT.  However, there are regrettably a few ways DoStackSnapshot can fail where you’ll see the dreaded E_FAIL instead.  From your code’s point of view, you shouldn’t assume E_FAIL will always imply one of the cases below (or conversely that each of these cases will always result in E_FAIL).  But this is just good stuff to know as you develop and debug your profiler, so you don’t get blindsided.

1) No managed frames on stack

If you call DoStackSnapshot when there are no managed functions on your target thread’s stack, you can get E_FAIL.  For example, if you try to walk the stack of a target thread very early on in its execution, there simply might not be any managed frames there yet.  Or, if you try to walk the stack of the finalizer thread while it’s waiting to do work, there will certainly be no managed frames on its stack.  It’s also possible that walking a stack with no managed frames on it will yield S_OK instead of E_FAIL (e.g., if the target thread is jit-compiling the first managed function to be called on that thread).  Again, your code probably doesn’t need to worry about all these cases.  If we call your StackSnapshotCallback for a managed frame, you can trust that frame is there.  If we don’t call your StackSnapshotCallback, you can assume there are no managed frames on the stack.

2) OS kernel handling a hardware exception

This one is less likely to happen, but it certainly can.  When an app throws a hardware exception (e.g., divide by 0), the offending thread enters the Windows kernel.  The kernel spends some time recording the thread’s current user-mode register context, modifying some registers, and moving the instruction pointer to the user-mode exception dispatch routine.  At this point the thread is ready to reenter user-mode.  But if you are unlucky enough to call DoStackSnapshot while the target thread is still in the kernel doing this stuff, you will get E_FAIL.

3) Detectably bad seed

If you seed the stack walk with a bogus seed register context, we try to be nice.  Before reading memory pointed to by the registers we run some heuristics to ensure all is on the up and up.  If we find discrepancies, we will fail the stack walk and return E_FAIL.  If we don’t find discrepancies until it’s too late and we AV (first-chance), then we’ll catch the AV and return E_UNEXPECTED.


Generally, this HRESULT means that your profiler requested to abort the stack walk in its StackSnapshotCallback.  However, you can also see this HRESULT if the CLR aborted the stack walk on your behalf due to a rare scenario on 64 bit architectures.

One of the beautiful things about running 64-bit Windows is that you can get the Windows OS to perform (native) stack walks for you.  Read up on RtlVirtualUnwind if you’re unfamiliar with this.  The Windows OS has a critical section to protect a block of memory used to help perform this stack walk.  So what would happen if:

  • The OS’s exception handling code causes a thread to walk its own stack
  • The thread therefore enters this critical section
  • Your profiler (via DoStackSnapshot) suspends this thread while the thread is still inside the critical section
  • DoStackSnapshot uses RtlVirtualUnwind to help walk this suspended thread
  • RtlVirtualUnwind (executing on the current thread) tries to enter the critical section (already owned by suspended target thread)

If your answer was “deadlock”, congratulations!   DoStackSnapshot has some code that tries to avoid this scenario, by aborting the stack walk before the deadlock can occur.  When this happens, DoStackSnapshot will return CORPROF_E_STACKSNAPSHOT_ABORTED.  Note that this whole scenario is pretty rare, and only happens on WIN64.

Comments (6)

  1. S Thakral says:


    I am working as a developer on ProfileSharp, .NET Profiler by

    I am currently facing a problem with DoStackSnapshot.

    It returns E_INVALIDARG when called.

    Pushing in some NULL bytes as parameters in the BYTE [] context param of the method however results in the error "No managed code frames are available for the thread".

    Can you help me understand how can I resolve this issue?

    In particular what sequence of methods precede the DoSnapShot method and how and what params would result in the method being called properly.

  2. David Broman says:

    The most likely reason for E_INVALIDARG is that you didn’t specify COR_PRF_ENABLE_STACK_SNAPSHOT when you called SetEventMask(). Most profiling features are unavailable by default, and need to be explicitly enabled by passing the proper flags to SetEventMask() (you normally do this from within your ICorProfilerCallback::Initialize() implementation.

    You can pass in NULL for the context parameter (not an array of NULLs but literally pass in a single NULL for this parameter), and we’ll give a stack walk starting as far up the stack as we can.

  3. Elbie says:

    Hello David,

    Now with DoStackSnapshot we get the functionID for each layer and an IP.

    Often we would need the RVA for each call (i.e. at which RVA of functionA does it make that call to functionB) (By _R_VA I mean with respect to start of a method) (I would later use symbol APIs to then resolve to file line).

    It is not totally clear to me how to do that translation. From functionID we could get method token and module. Module ID would give us the module load base address I think. (But this might not be needed) And we would also need the "VA" of the function to compute the RVA of the callsite. I think the metadata APIs has something to give us that VA … hmmm. seems like a pain.

    But I am not sure if it needs to be so complicaetd…. Did I miss some obvious way to achieve the same purpose?

    Any suggsetions?



  4. David Broman says:

    Hi, Elbie.  Not sure if I totally understand your scenario, but I’ll give it a shot.  If your goal is to go from IP to "native offset from beginning of function", you’ll want to look at ICorProfilerInfo2::GetCodeInfo2().  The catch here is that there’s not necessarily a real "native beginning of function".  If the JITted code is split into regions, then you don’t have a single contiguous chunk of instruction space to play with.

    Anyway, look in corprof.idl at the comments near ICorProfilerInfo2::GetCodeInfo2().  Please let me know if I totally missed the point of your question.  🙂

  5. Nilabja Roy says:

    Hi David,

     I am trying to profile the performance of threads in any given MT application. I am using the RuntimeThreadSuspended to capture the event when a thread is blocked waiting on a mutex.

     But when I parse through the stack, I do not find any call to the System.Threading.WaitHandle. Am I doing something wrong ? Does the  RuntimeThreadSuspended gets called when a thread is bloscked ? Or is there any other callbacks to capture this event ?



  6. David Broman says:

    Hi, Nilabja.  Unfortunately, the callback you’re using isn’t what you want, and there is nothing straightforward you can use.  RuntimeThreadSuspended is issued when a thread is being suspended by the runtime, typically in preparation for doing a GC.  It sounds like you’re looking for a notification that tells you that user code is causing a thread to wait on a mutex or some other synchronization object.  We have no such callback for that condition, but you could consider using IL instrumentation to rewrite every possible call into a synchronization method (e.g., calls to Monitor.Enter/Exit), to add another call into your profiler first.  This is quite a manual process for you to go through, but unfortunately it’s the best I can come up with.