Rules of Funceval

Funceval (aka "Function Evaluation" or "Property Evaluation")  is the ability to inject some arbitrary call while the debuggee is stopped somewhere. A debugger commonly uses funceval to run ToString() and property getters. I want to describe when it is legal to initiate a funceval.

Doing a funceval:
For the CLR, funceval means hijacking a given thread (of the debugger's choosing) and using that thread to execute the requested method. The debuggee is then resumed to let the funceval execute. When the funceval finishes, it fires a debug event. If the eval completes normally, it fires a EvalComplete event. If the eval throws an uncaught exception, the funceval has its own try-catch in there to backstop the exception, and it fires an  EvalException event. See the ICorDebugEval interface for further API usage.

Scope of this entry:
Funceval is a huge topic. This entry is just covering the technical restrictions on when the CLR will let the debugger even issue a funceval. Several things I won't covered here:
1) funceval is evil and introduces problems that are difficult to communicate to a user. You may corrupt your debugggee or deadlock. The funceval injects calls and thus may introduce codepaths that never existed at compile time. The funceval also inherits the thread's state including locks and security settings. Please see here for more details about the dangers of funceval.
2) How to abort a funceval.
3) How do you present a useful funceval experience to the user.
4) What cool things (eg, Visualizers) can you do with funceval?

Threading issues:
Funceval occurs on a single thread and is technically orthogonal to activity on the other threads. (This is just  like Stepping) The debugger picks which thread to funceval on by picking which ICorDebugThread instance to call CreateEval() on. It's generally, the "active" thread (eg, the one that you just hit a breakpoint on). When it sets up the eval and resumes the process, all threads continue to run. The debugger can choose to suspend other threads to prevent them from slipping. This would give the illusion that the eval is inspection-only. However:
1) Asynchronously suspending threads and resuming the process is dangerous and may cause deadlocks.
2) If the eval blocks on a suspended thread, you get a deadlock. This is very common in COM-interop because the debugger will eval on some MTA thread, suspend all other threads including STA threads that would pump messages from the MTA thread.
The debugger could try to get sneaky about not suspending certain threads. This is a whole policy nightmare, which is exactly why the CLR doesn't suspend any threads and leaves the entire decision up to the debugger.

It's also the debugger's policy about how to handle if a thread hits a breakpoint in the middle of the eval. VS 2003 would automatically skip such breakpoints and events in order to pump the thread out of the eval. VS 2005 handles "nested break states", and will let you stop inside of an eval.

Funceval and the Garbage collector? :
Funceval has to coordinate with the garbage collector in order to avoid deadlocks.

A thread is at a Garbage Collection (GC) Safe point point if that thread will not block a GC. This includes:
1) A thread out in native code. (Technically a thread in Preemptive mode)
2) A thread in managed code but at arbitrary interruptible points, which are determined by the JIT. The exact location of these safe points are outside the users' control. However, they are much more frequent in debuggable code than they are in optimized code.  Practically all sequence points will be GC-safe.

You can only initiate a funceval on a thread that's at a GC safe point. Why? If the hijacked thread is not at a GC safe point, then any GC will be blocked waiting for the funceval thread to slip to a GC safe point. But if the funceval causes a GC to occur, then the funceval is blocked waiting for the GC to finish. Thus you would get a deadlock between the funceval and GC.

Late in V2, we added a feature to allow the debugger to find if a thread is at a GC safe point so that it could make better policy decisions. If ICorDebugThread::GetUserState() returns flags with USER_UNSAFE_POINT set, then the thread is not a GC safe point.

A better name?  "GC-safe" is really an intrinsic concept that the user has no control over, and it doesn't clearly relate to other debugging APIs either. If I tell you a thread is "not at a GC-safe point", what are you supposed to do with that information besides avoid suspending that thread? Thus in retrospect, a better way to have exposed this may have been "Suspend Safe" instead of just GC-safe.  That's really what the user needs to know. And if the CLR adds another non-GC concept that make suspension unsafe, then it's clear how to update the semantics of a "Suspend Safe" flag.

When can you do a funceval? You canalways attempt a funceval, but the call will immediately fail if the thread is not at a Funceval Safe (FESafe) point. A FESafe point is where the CLR can actually do the hijack for the funceval. As mentioned above, you must be at a GC safe point. In addition, a thread must be:
1) stopped in managed code (and at a GC safe point) :
This means you can't do a funceval in native code. The motivation here is that native code is outside the CLR's control, and so the CLR doesn't know enough about  native code to be able to setup the funceval.

Common ways to stop in managed code include stopping at a breakpoint, step, Debugger.Break call, intercepting an exception, or at a thread start. Or you may just get very lucky when you break on one thread and happen to catch another thread in managed code.
You can always stop a thread at a GC / Funceval unsafe point in managed code. A function's prolog is a common GC-unsafe zone. You could open up the native disassembly window and set a breakpoint at a native offset in the managed function which is not a safe spot; or you could single-step in the native disassembly window to an unsafe spot. Or if you step-in to a managed call in the native disassembly window, you'll land in the method's prolog (which is gc-unsafe).
Aysnc Break will asynchronously stop all threads as soon as possible, and thus the threads are very likely to be at an unsafe point.

2) OR stopped at a 1st chance or unhandled managed exception (and at a GC safe point):
This options is motivated by end-user scenarios. When a user gets an exception, it's very convenient to be able to inspect as much as possible to determine why that exception occurred. For example, a debugger may want to evaluate the Message property on the newly raised exception.

Improving errors:
The CLR will detect attempts to initiate a funceval and fail gracefully if the thread is not at FE safe point.. In V1.1, this mean ICDEval::CallFunction would return CORDBG_E_FUNC_EVAL_BAD_START_POINT. However, there are so many different ways to fail an eval, and so one catch-all error is not very helpful to end users.

So late in V2, we added more error granularity. This is done by filtering the HR before we hand it back. If we're about to return CORDBG_E_FUNC_EVAL_BAD_START_POINT, we check for several other conditions to see if we can return a more specific error code. We do this specifically to ensure that we don't overzealously fail an eval.
The debugger can not reasonably do anything intelligent with these error codes beyond presenting a more descriptive message to the end-user. If a funceval fails for multiple reasons, the specific HR we return is arbitrary, though we try to return the HR describing most likely issue.
The new errors codes (defined in CorError.h)  are:

  • CORDBG_E_ILLEGAL_AT_GC_UNSAFE_POINT: The thread is not at a GC-safe point.
  • CORDBG_E_ILLEGAL_IN_PROLOG: The thread is in the prolog. 
  • CORDBG_E_ILLEGAL_IN_NATIVE_CODE: The thread is in native code.
  • CORDBG_E_ILLEGAL_IN_OPTIMIZED_CODE: The thread is in optimized code. We actually debated about this one. Optimized code is technically not a restriction for funceval. However, optimized code greatly increases your chances of being at a GC-unsafe point. We thought it would be a bad user experience to hammer them with a bunch of GC_UNSAFE_POINT failures at every spot in the optimized function. If you do happen to find a FE-safe spot in an optimized function, the eval will still succeed. 

A debugger can use these HRs, along with other thread properties picked up by ICDThread::GetUserState and ICDThread::GetDebugState, to give descriptive messages about why a funceval may not be allowed. I hit the following playing around in VS:

  • "Cannot evaluate expression because the current thread is stopped in the prolog of a function." (CORDBG_E_ILLEGAL_IN_PROLOG)
  • "Cannot evaluate expression because a thread is stopped at a point where garbage collection is impossible, possibly because the code is optimized". (CORDBG_E_ILLEGAL_AT_GC_UNSAFE_POINT, CORDBG_E_ILLEGAL_IN_OPTIMIZED_CODE)
  • "Cannot evaluate expression because the current thread is in a sleep, wait, or join." (USER_WAIT_SLEEP_JOIN from ICDThread::GetUserState)
  • "Cannot evaluate expression because a native frame is on top of the call stack."  (CORDBG_E_ILLEGAL_IN_NATIVE_CODE)
  • "Cannot evaluate expression because the current thread is suspended." (THREAD_SUSPEND from ICDThread::GetDebugState).