SF/JavaOne, Day 3, Omniscient Debugging


I’m at an enormously interesting discussion today concerning an
incredibly interesting and powerful new idea being presented in the
debugging space.  It’s called Omniscient Debugging
(OD) and if it’s efficient enough, then i think it’s going to
fundamentally change how developers track down and fix problems with
their programs.



The idea behing OD is that as your program is running, a log is being
kept of all state changes made in the program.  With this log,
it’s now possible to look at the entire state of your program at any
point in time.  And here’s the kicker, given the state of the
program at any point in time, it is trivial to move forwards or
*backwards* in time from that point to see what the state is
then.  That means that if you get a crash or get into some other
form of bad state, it’s now possible to move backwards and backwards
until you can determine the actual location where the bug was
introduced.  One of the benefits of this is that you now don’t
need to do the common “hunt-and-peck for the right breakpoint location”
debugging style.  And, even better, you now don’t have to worry
about nondeterministic problems where you might not see the issue
reproduce itself every time you debug.  Because you have a full
state log of when the problem occurred, you have *everything* you need
to figure out the problem because you can simply start the program at
the beginning and look at what every single thread is doing at any
moment in time.  It’s completely possible to relplay the entire
problem straight from the begining.



The technique seems to be based on an extreme form of aspect weaving,
where any time a state transition occurs the specific type of change
and the actual changes are written to a log.  i.e. if a method
call is made, a log entry is written telling what the method name is
and what the arguments were that were sent to it.  If a store is
made into a variable, then the var name and the new value are stored in
the log, etc. etc.  So the weaver goes through the entire bytecode
and augments all of these statechanges that the user will care about
with calls out to the logging facility.  While this will
definitely change some of the runtime characteristics of your program,
it shouldn’t change the semantics of your code.  So, as long as
you’re debugging issues that are not dependent on runtime
characteristics you’ll be fine.  But, of course, this problem
occurs in actual debuggers as well and this kind of problem is cutely
described as a Heisenbug.



Now, in my own experience with writing a profiler, i found that a
non-sampling profiler that was just doing the most basic logging of
method entries and exits, ended up being about a 50x slowdown in
program performance.  And having a non-sampling profiler usually
ended up producing a log file that was gigabytes large in a very short
amount of time.  It’s these reasons why, in general, sampling
profilers tend to be more popular.  Your performance is much mroe
reaosnable, and the amount of data stored is usually sufficient for any
program running for a reaosnable amoutn of time to get a decent
approximiation for where the time in your program is being spent. 
But, unfortunately, i can’t see how sampling would help here since the
idea about sampling is to not pay attention to everything that the
program is doing whereas the idea behind OD is the exact
opposite.  Impressively enough though, the speaker can get
anywhere from 40x slowdown to 1.5x slowdown.  I guess my own
profiling skillz aren’t up there yet (or i was testing the overhead on
a really tight loop where the slowdown became really exagerated). 
It seems like this would definitely be a viable tool for diagnosing and
solving problems in your own code!



I checked around, but was unable to see any similar project for C#.  But it’s something i’d like to work on myself! 🙂


Comments (12)

  1. evolve says:

    my god, i wish i were there!

  2. Jim Argeropoulos says:

    I remember seeing a VC6 add-on package that did this. I can’t remember the product now. We almost automatically threw it out due to price. The developer seat price was many thousands of dollars.

  3. MarkSW says:

    This exact idea in Java was covered in an article in the last couple of months in Dr. Dobbs. My recollection is that the author is a university CS prof that uses the tool in his teaching and reported slowed but still reasonable performance. Sorry I don’t have the article handly, but this should be findable.

  4. Eric Lippert says:

    I vaguely recall from my undergrad days that there was a lisp or scheme implementation that used Continuation Passing Style internally — by keeping copies of all the continuation information around you could write a debugger that ran the program forwards or backwards. I don’t see any reason why this REQUIRES CPS though, in a world with abundant memory and managed code.

  5. nathans says:

    Good great and wonderful for trivial applications (e.g. the sample bugged bubblesort app), but what about "real" applications such as SmartClient apps or any real world business application where you’re actually dealing with N applications at a time.

    For this sort of approach to be of any practical use you’d have to have a tool that could synchronize the state changes among the applications and possibly introduce caller association as well.

  6. Radu Grigore says:

    You mentioned OCaml a couple of times so I’m surprised that you seem surprised about this idea. The OCaml debugger does that for quite a few years.

  7. I toyed with the idea of an omniscient debugger for .Net a while back: http://blog.monstuff.com/archives/000058.html

    There are two aspects: complete tracing and integrating with a debugger. I only looked at the first part, which seems possible using the profiler APIs to modify the IL at runtime.

  8. Some time back Texas Instruments was building a similiar feature named HindSight into their debugger and processor simulator. In this a developer could run the code on a device functional simulator and while debugging with the Code Composer Studio IDE could actually step back in exectution.

    I do not know the current status of the project though….