Hardening Server Applications [Immo]

From time to time a company ships a product that has a huge impact on their ecosystem. A good example for us is certainly .NET. The biggest value proposition that managed code has is that it is, well, managed code. The CLR provides runtime management components such as a garbage collector or reflection that are aimed at reducing the likelihood of bugs and at increasing the developer's productivity. These features allow developers to focus on building their applications instead of tweaking and massaging mechanics. After all, the hope is that by improving the software that is used for developing other software we improve software on a broader scale (how meta!).

Today we are proud to make an announcement that potentially marks a milestone similar to the PDC 2000 announcement of .NET. Several teams at Microsoft worked during the last years on an upcoming product – code named "Source Code".

The Problem

We all have seen it: a customer reports a bug and after debugging it for a while we realize that the bug fix involves changing a single line. Sometimes, the fix only involves fixing a single character, such as the famous off-by-one error where one only needs to replace "<=" by "<".

Quite frequently, small errors can have devastating effects. For example, an Ariane 5 launch vehicle had to self-destruct due to a single casting error where a 64 bit floating point value was converted to a 16 bit integer. This caused the flight computer to make wrong adjustments due an arithmetic overflow.

How "Source Code" works

Research has investigated several strategies to enable computers to learn. One fruitful approach is genetic programming.

In artificial intelligence, Genetic Programming is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task.

---Wikipedia

The basic idea is that by mutating existing code and applying a selection-function one can automatically find a computer program that solves a given problem.

During the last years the BCL team, the wider CLR team and Visual Studio worked together with Microsoft Research to build a product around this idea. The result, code named "Source Code", is a combination of new technologies with evolved versions of existing technologies such as IntelliTrace, software transactional memory (STM) and Pex. The basic idea is simple. Whenever the CLR discovers an unhandled exception it rolls back the state of the applicationto a point in time prior to the crash (the default is 8 minutes but can be configured). To do this, the CLR uses a full IntelliTrace recording so that the whole runtime state, including the heap, can be properly restored. Then, the stack trace of the unhandled exception is analyzed to detect which method is the most likely culprit. This information is passed to Pex in order to create a permutated version of the method body ("mutation"). This enables new code paths not previously explored in the application. After that, the CLR resumes execution. If the application crashes again, it repeats the above process until a code path is found that avoids the error ("selection"). Over time, successful code changes persist improving the overall fitness of the application ("survival of the fittest").

M5 Multitronic System from the Star Trek episode 'The Ultimate Computer'

Picture of the M5 computer in the Star Trek episode "The Ultimate Computer".

As a result of the above process as application will automatically correct itself over time! The brilliant Dr. Richard Daystrom, the designer of the "M5 Multitronic System" (pictured above) would be very pleased to see this advancement.

Currently the technology is in an early prototype state and we are working on removing some restrictions:

  • Because of the involved downtimes during rollback and mutation, this scenario will only be available to Windows Azure based ASP.NET applications. For the future, we plan to extend the support to client applications as well.
  • "Source Code" will only save a limited number of applications. We have seen cases where partially working applications are displaced from the cache by more buggy applications. Our next goal is to improve the cache policy to avoid these situations.
  • The early CTP does not include support for management and monitoring but we are actively working to get "Source Code" integrated with the Microsoft System Center products.

You can download a CTP here.