Obfuscation


The issue of obfuscation and decompiling .NET code comes up on a fairly regular basis, so I thought I’d explore it in some more depth, and try to address some of the common questions that arise.


 


This is not a new issue or topic – since the first high level compilers existed, people have explored the ideas of reverse engineering programs to reveal the source code. Back in the 1980s I used to program in C, and one of the languages nice features was the predictable way it generated assembly code based on the C source code, which made it possible to write high level language code, yet still steel feel in control of the assembly it generated. As newer programming languages introduced ever greater levels of abstractions, the mapping has become more complex, so decompiling an .exe file from a modern language is a non trivial task. (Open an .exe in the debug editor and try to make sense of it, if you doubt this)


 


In contrast to this, .NET assemblies carry with them a lot of metadata, which affords the runtime a number of important benefits, but as a side effect, makes them easier to decompile and discover the original intent of the programmer. The good news though is that this is a well understood issue, and it can be relatively easily addressed.


 


There are two really good articles that I urge you to read to better understand this topic. The first is a very readable MSDN Magazine article “Thwart Reverse Engineering of Your Visual Basic .NET or C# Code”, and the other is the Visual Studio documentation on MSDN “Goals of Obfuscation” (written by PreEmptive Solutions, who wrote the Obfuscator in Visual Studio 2003). Together these two articles should answer most of your questions.


 


I love the analogy based on food in the MSDN article, likening obfuscation to putting a six course meal into a blender, so no once can identify the ingredients, whilst still delivering those contents to the recipient – not a strict computer analogy, but a nice one just the same!


 


One of the common questions is – why is .NET assembly code encrypted instead of obfuscated? The MSDN article addresses this nicely “You could encrypt .NET assemblies to make them completely unreadable. However, this methodology suffers from a classic dilemma—since the runtime must execute unencrypted code, the decryption key must be kept with the encrypted program. Therefore, an automated utility could be created to recover the key, decrypt the code, and then write out the IL to disk in its original form. Once that happens, the program is fully exposed to decompilation.”


 


Another common question is “shouldn’t the code be obfuscated by default?”. The danger of default obfuscation is that someone would crack whatever algorithm was used, and publish a decompiler tuned to that, so the apparent safety of the default solution would turn out to be deceptive. It’s better to decouple the obfuscation process, and have an after-market of ISV’s whose focus it is to create ever smarter obfuscators, keeping ahead of the decompiler writers. PreEmptive, for example, have a range of offerings, each with different capabilities.


 


Most people who are interested in obfuscation tend to do so for a number of specific reasons. Firstly, the code implements some form of copy protection (such as checking the presence of a CD, requiring a registration number, etc) where if the relevant algorithm could be decoded, it would be possible to bypass the copy protection. Another is the code utilises some Intellectual Property (IP) that needs to be kept secret, and which is core to the way the program works. A slightly different variant is the software handles sensitive information (such as financial, personal, etc) where it might be possible to create a hack or spoof the system if details of the way the information is handled could be decoded. The final one, and perhaps the most common, is people just plain don’t want others to see inside their code, just as a general thing, rather than any of the other factors. Given some of the poor code I have seen over the years, it may be just to avoid embarrassment !


Comments (5)

  1. denny says:

    In the end In see all forms of "copy protection" and the like as silly …

    in the end If someone wants to work out how you did "it" they can and will.

    all we are doing is making it more difficult to get back the source lines and comments.

    IMHO way to often developers and businesses spend way to much cash on the effort with minimal returns…

    if a class library is good and at a fair price folks will buy it rather than spend the time decompiling it… most software sold to the "average consumer" will never be decompiled by the user.

    so I’d say study how much cash you spend for how much benefit… and include the hassles that come with over protecting things….

    I have seen more than one time where games would not run on some systems due to the cd protection code trying to access for example data at > 600 megs …. older cd rom drives cant go there, why should the user have to go buy a new drive for the benifit of the publishers ??

    other times CD-RW drives have had issues with some disks….

    I could give many other examples where the legal licensee is ready to chuck the software due to over zelous use of protection….

  2. Balaji says:

    In the end, it appears that MS did not put in adequate thought about decompiling when they designed .Net framework. Let’s face it. Almost the whole of .Net source can be viewed using the .Net Reflector tool. This should definitely be a boon to those hackers out there who want to exploit every little MS vulnerability.. Even thought MS seems to be underplaying the effect of decompiling, I think it’s malicious potential is far more that what is being said. At least lot of MS .Net code will be "open source" whether by intent or not.

  3. Steve Hurst says:

    I agree I feel Microsoft did not put enough thought into this area. I fell it has been an after thought to put the obsfucator in and it is seen as an achillies heel.

    I understand the problems of encryption but I really think there needs to be something.

    Partiulary with people trying to develop smart clients, they are fighting an upward battle against normal web pages for managing data,in the area of security. What will it take for somebody to sit ,track and then find ways to skip the security modules within a smart client, something you can’t do with Web pages.

    No, having assemblies that have fixed lifetime where the private keys regualry change and the users having to down load a new security assembly is really the only way.

    If you want to use anologies we do it for passwords and I think the same should be appplied to application in which the pasword is being used.

  4. Nigel Watson says:

    This is a challenge that applies to all platforms, not just .NET.

    At the end of the day, any kind of code protection mechanism can be defeated, given an attacker who is sufficiently determined, skilled and tooled up.

    Obsfucation/encryption techniques work on the principle of ‘raising the bar’ so that the cost of cracking your code exceeds the likely value to be gained from doing so. But these techniques are not fullproof, and in some cases actually decrease your security by giving you a false sense of protection against people looking at your code.

    This is an old argument. For instance, I remember the enormous amount of work that some game development houses put into protecting Amiga code via various encryption/code permutation/loader hacks. And at the end of the day, their code was still generally as widely pirated as anyone elses.

    If you are trying to protect against exploitation of security vulnerabilities in your code, you are better off building for security from the ground up (secure by design, default and deployment) – and avoiding code vulnerabilities in the first place – rather than applying obsfucation or encryption techniques to protect your cock-ups from prying eyes. Good security should withstand examination of the code.

    If you are trying to protect some IP that is embodied in your code, then you need to decide whether you actually want to deploy that code onto client machines. If you really, really really care about this IP, then you wont. You’ll stick it in a web service somewhere, and make clients connect to it via a network. No code deployed to client means no code running on the client means no files or in-memory images of your IP. Period. Maybe this is feasible, maybe not, but you have to decide whether the value of your IP is enough to warrent this approach.

  5. Brian Duffy says:

    I tend to agree with the other posters… obfuscation is often a waste, as it doesn’t buy you much.

    Whenever I have dealt with vendors who spent alot of time obfuscating code, the obfuscation was more about covering up cob-job work than "protecting" IP.

    Value is all about perception. If you are creating something that adds value to someone else’s work, they generally license it. That’s why some web people use IIS, even in the face of excellent & free competitors like Apache.