CLR 4.0 advancements in diagnostics

We announced at PDC today that we're making some significant advances in diagnostics tool support for CLR v4!  In particular, we've been investing heavily in improving our support for production diagnostics scenarios over the past couple years.  I'm excited that we're finally able to start talking about it!

Here's a quick list of some of the things we're doing - stay tuned here, Dave's blog and Mike's blog for more details.  Also feel free to ask questions about our strategy and specific plans for new features on our forum (of course we still have a few things we're not ready to talk about yet).  Of course, all of the features below are only available when targetting a process that is running inside version 4 of the CLR.

  1. Managed dump debugging
    Finally you'll be able to open crash dump files in Visual Studio and see managed state (stacks, locals, etc.) without using SOS.  The key scenario we want to enable here is taking a dump of an app/server in production (perhaps in an automated way like Windows Error Reporting) and opening that dump in Visual Studio on another machine at some point in the future.  We have a big piece of this working in the VS 2010 CTP, but we still have some work to do before beta (eg. the CTP supports dumps with full heap memory).  The experience in VS is very much like being stopped at a breakpoint in a live process, except you can't say "go".  Of course production code tends to have JIT-optimizations enabled, so the normal caveats about debugging optimized code apply here too (eg. may not see all locals).  Also, you can't evaluate arbitrary expressions since there is no target process to call functions in (but we have some ideas for how we might compensate for this).  But despite the caveats, this is still a huge feature that should really help improve production diagnostics scenarios.  This work is actually the main visible piece of a much larger "out-of-process debugging" re-architecture we've been working on for years.  This re-arch deserves a post of it's own so stay tuned.

  2. Profiler attach (and detach) for memory diagnostics and sampling
    One of the most common feature requests we hear from profiling tools is to be able to attach to a target process (today you have to set some environment variables at process start which cause your profiler to be loaded).  Before you get too excited - this doesn't have everything you want.  In particular, the CLR still doesn't have the ability to change the code of a method once it's been JIT-compiled (EnC is a very special case - not really applicable here).  This means that IL instrumentation isn't available on attach, as well as a few other features (like object allocated callbacks).  But basic memory diagnostics scenarios where the profiler inspects the heap, and simple sampling-based CPU profiling will now work on attach.  We anticipate this will be useful in production scenarios - you can walk up to a server behaving badly and attach a profiler, collect some data, and detach - leaving the process in basically the same state it was before you attached.
    [Update: See Dave's blog entry here for additional details on profiling API improvements]

  3. Registry-free profiler activation
    One major impediment to the sort of production scenario I described above is that today you have to register your profiler in the registry.  In many production scenarios, making some change to the machine-wide system registry is very unappealing (will the dev remember to undo the change when he's done with the server, etc?).  So to really enable production scenarios, we've also supplied a mechanism for running a process under a managed profiler (or attaching) without having to make any changes to the registry.

  4. x64 mixed-mode debugging
    This isn't really a production diagnostics scenario (although you can do x64 mixed-mode dump debugging), but is one of the main debugging feature requests we've gotten.  With this feature, "mixed-mode" (native+managed) debugging will work for x64 processes in basically the same way it works for x86 today. 

  5. lock inspection
    We're adding some simple APIs to ICorDebug which allow you to explore managed locks (Monitors).  For example, if a thread is blocked waiting for a lock, you can find what other thread is currently holding the lock (and if there is a time-out).

  6. Corrupted-state exceptions
    This feature doesn't come from our team (it's part of the core exception-handling sub-system in the CLR), but in my opinion it's a huge improvement for diagnostics scenarios.  Basically it means that "bad" exceptions (like access violations) that propagate up the stack into manage code no longer (by default) get converted into normal .NET exceptions (System.AccessViolation) which you can accidentally catch in a "catch(Exception)" clause.  Basically, haivng a catch(Exception) which swallows AVs coming from native code is a bad thing because you're unlikely to be able to reason about the consistency of your process after the AV.  The default behavior for such "corrupted-state exceptions" is now to fail-fast and send an error-report (just like in normal C++ programming).  Of course, you can override this if you REALLY need to catch such an exception.

That's the overview of the main CLR v4 features that affect diagnostics.  Of course there are also lots of other great things coming in CLR v4 and the rest of .NET Fx 4.0.