SF/JavaOne, Day 3, Omniscient Debugging

Article
06/28/2005

I'm at an enormously interesting discussion today concerning an
incredibly interesting and powerful new idea being presented in the
debugging space. It's called Omniscient Debugging
(OD) and if it's efficient enough, then i think it's going to
fundamentally change how developers track down and fix problems with
their programs.

The idea behing OD is that as your program is running, a log is being
kept of all state changes made in the program. With this log,
it's now possible to look at the entire state of your program at any
point in time. And here's the kicker, given the state of the
program at any point in time, it is trivial to move forwards or
*backwards* in time from that point to see what the state is
then. That means that if you get a crash or get into some other
form of bad state, it's now possible to move backwards and backwards
until you can determine the actual location where the bug was
introduced. One of the benefits of this is that you now don't
need to do the common "hunt-and-peck for the right breakpoint location"
debugging style. And, even better, you now don't have to worry
about nondeterministic problems where you might not see the issue
reproduce itself every time you debug. Because you have a full
state log of when the problem occurred, you have *everything* you need
to figure out the problem because you can simply start the program at
the beginning and look at what every single thread is doing at any
moment in time. It's completely possible to relplay the entire
problem straight from the begining.

The technique seems to be based on an extreme form of aspect weaving,
where any time a state transition occurs the specific type of change
and the actual changes are written to a log. i.e. if a method
call is made, a log entry is written telling what the method name is
and what the arguments were that were sent to it. If a store is
made into a variable, then the var name and the new value are stored in
the log, etc. etc. So the weaver goes through the entire bytecode
and augments all of these statechanges that the user will care about
with calls out to the logging facility. While this will
definitely change some of the runtime characteristics of your program,
it shouldn't change the semantics of your code. So, as long as
you're debugging issues that are not dependent on runtime
characteristics you'll be fine. But, of course, this problem
occurs in actual debuggers as well and this kind of problem is cutely
described as a Heisenbug.

Now, in my own experience with writing a profiler, i found that a
non-sampling profiler that was just doing the most basic logging of
method entries and exits, ended up being about a 50x slowdown in
program performance. And having a non-sampling profiler usually
ended up producing a log file that was gigabytes large in a very short
amount of time. It's these reasons why, in general, sampling
profilers tend to be more popular. Your performance is much mroe
reaosnable, and the amount of data stored is usually sufficient for any
program running for a reaosnable amoutn of time to get a decent
approximiation for where the time in your program is being spent.
But, unfortunately, i can't see how sampling would help here since the
idea about sampling is to not pay attention to everything that the
program is doing whereas the idea behind OD is the exact
opposite. Impressively enough though, the speaker can get
anywhere from 40x slowdown to 1.5x slowdown. I guess my own
profiling skillz aren't up there yet (or i was testing the overhead on
a really tight loop where the slowdown became really exagerated).
It seems like this would definitely be a viable tool for diagnosing and
solving problems in your own code!

I checked around, but was unable to see any similar project for C#. But it's something i'd like to work on myself! :-)

SF/JavaOne, Day 3, Omniscient Debugging

Additional resources