Share via


A Primer for Restoring Lost Source Code from IL (a.k.a. Reverse Engineering .NET apps)

 

I’ve been working on several internal tools for some of the folks on the exchange support team. I had recently incorporated some additional features that my “customer” requested. I sent the updated EXE to the users after implementing the necessary changes and applying my, thorough, “test suite” (*wink, wink*) as ALL developers do.

To my uncanny surprise I received an e-mail that “everything looks great!” There were no bugs induced by my changes nor did I leave any features out. It did so appear that I had done a flawless job on this one. Imagine that! Patrick probably believes that less than I do but it’s ok, he’ll find out one day!

Indeed I did do a good job…so much so that the software deities decided to liberate me of my flawlessness by telling my hard drive to have its head slip off the platter. With that crash I was rudely snapped back to reality and to the topic stated within the title of this blog.

Naturally as a good software practitioner I had my source code within source control. To do myself one better my personal source code database is stored on a network location that is backed up nightly. Once back to a working state (two days, one over night hard drive replacement, and many, many installation dialogs later) I tried to contemplate the loss I had suffered. It turns out I had not checked my changes into source control, all the changes I had made were lost.

So, what to do?

Again luckily for me, I was in a state where I wasn’t exceeding my mailbox limit on my mail server and as such was able to retrieve the EXE from the mail I had sent which was sitting in my “Sent Items” folder. I decided it was remotely possible to retrieve the lost changes I had made by doing a diff in the IL code of the latest EXE (the one from the mail) and on the one stored in source control.

I created two folders on my computer: G:\RESTORE\LATEST_EXE and G:\RESTORE\LATEST_SOURCE_CONTROL. In the former folder I put the executable that was retrieved from my mail and the latter folder received a freshly compiled executable from the latest version stored within source code database

Using ILDASM (<Install Folder>\Microsoft Visual Studio .NET 2003\SDK\v1.1\bin\ildasm.exe) I was able to save the IL for each of the executables in their respective folders from above. I did modify the dump options for ILDASM:

· Encoding – ANSI

· DUMP IL CODE

ü Source Lines

ü Expand Try/Catch

· Dump Metainfo

ü Dump Unresolved Externals

Then, I grabbed the handy little WinDiff tool from Visual Studo 6 to compare the changes between the two IL files. The results were less than spectacular! Since I had made changes to the source code, all of the memory addresses within the IL were offset thus causing nearly every line of both files to mismatch.

Now what?

I created a console application that went through the IL files and stripped out all of the unnecessary information: comments (mainly the meta info), code side statements, IL addresses, and branch statements (which are mostly irrelevant also due to the address shifts). Since I was looking for the source code difference so I could quickly put them back in I was not looking for a 1:1 reverse engineering solution as this would cost me way more time to do than to recreate the tool in the 1st place.

Some Notes:

  1. You can take this a step further by using the IL spec and do a complete restoration of the code however I’ll let someone make a profit off of this.
  1. Check out Matt Pietrek’s presentation: about reverse engineering .NET applications: https://www.develop.com/conferences/conferencedotnet/materials/N8.pdf.
  1. This worked best form me since I know my code and I remeber somewhat the changes I made so using the diff I could easily recode everything. Also knowing a bit of IL helps.
  1. Everyone is better off by maintaining daily backups of their machines.
  1. This will not help you if the code’s obfuscated.
  1. ILDASM ships with VS.NET and Windiff ships with VS6.
  1. Here’s the code for the IL stripping program:

using System;

using System.IO;

using System.Text;

using System.Text.RegularExpressions;

namespace StripIL

{

          /// <summary>

          /// Summary description for Class1.

          /// </summary>

          class StripIL

          {

                   /// <summary>

                   /// The main entry point for the application.

                   /// </summary>

                   [STAThread]

                   static void Main(string[] args)

                   {

                             //

                             // TODO: Add code to start application here

                             //

                             // Make sure we get a command line argument which we assume is the file name.

                             if(args.Length < 1)

                                      return;

                            

                             Regex exp = new Regex(" {6}\\w{7}\\: ", RegexOptions.Singleline);

                             Regex exp2 = new Regex("[brefalse|bre|br]\\s*IL_\\w{4}", RegexOptions.Singleline);

                             StreamReader tr = new StreamReader(args[0]);

                             StreamWriter tw = new StreamWriter(args[0] + ".updated");

                             StringBuilder sb = new StringBuilder();

                             string line;

                             while (tr.Peek() != -1)

                             {

                                      line = tr.ReadLine();

                                      // Ignore comment blocks (removes the metadata section of the IL file)

                                      if (line.IndexOf("//") != 0 && line.IndexOf(" // Code size") != 0)

                                      {

                                                // Strip out the IL address as changes in the code will

                                                // kill us when we do the file diff

                                                // IL_0001: ldc.i4.7 //Sample IL line

                                                if (exp.IsMatch(line))

                             {

                                                          line = line.Substring(line.IndexOf(": ") + 1);

                                               

                                                          // See if this is a break statement cause if it is we can pull it out

                                                          // as well...if there were code changes the memory addresses here would change as well

                                                          if (!exp2.IsMatch(line))

                                                          {

                                                                   // Send the line to the new file

                                                                   tw.WriteLine(line);

                                                          }

                                                }

                                                else

                                                          tw.WriteLine(line);

                                      }

                             }

                             // Close the new file

                             tw.Close();

                   }

          }

}