Refactoring C and C++ Code for Security

Article
03/08/2016

I have been programming in C and C++ since I was 15 years old. And no, I won’t tell you how long ago that was! I have always loved both languages, and still do, but when the first internal pre-releases of Visual Studio 2013 came out, I selected C# as my prime language. To be honest, I felt like a deserter!

When building and working on operating systems, C and C++ are the dominant languages, but when working with customers building line of business and cloud applications, most are using higher level languages like C#, JavaScript and Java. With that said, I would say that about 25% of customers I work with have some form of legacy C and C++ code in production and while it is often not exposed directly to the Internet, much of it sits right behind a web server. In other words, it’s still in the firing line of attackers. For example, some systems I have reviewed take a web request (ASP.NET, PHP etc.), turn the request data into a proprietary format and shoot it over a socket to a service or daemon written in C or C++ listening on a TCP socket. The C/C++ code then performs some parsing and queuing and shoots the request to a back end system which processes the data and returns the result back up the pipeline.

Of course, there are classes of systems that use C/C++ throughout. For example, many control systems use C/C++ for the core system and use higher level languages, such as C# and Java, for the management systems.

In short, there’s still a great deal of old, crusty C and C++ code out there that is directly or indirectly open to attack.

This code should be updated where possible to improve its security, but this should be done in a way that does not introduce regressions and requires very little engineering effort. I am not saying a customer should spend thousands of hours securing C and C++ code (some should, however!) but there are things that can be done that raise the security bar easily, and this means refactoring the code with an eye on security.

What’s Refactoring?

Refactoring is a process where code is improved in some way without changing how it functions. Refactoring examples including making code more legible or maintainable. The rest of this commentary focuses on refactoring C and C++ code so it is more secure and for C and C++ code, that means reducing the number of potential memory corruption issues in the code.

Memory corruption (also called memory safety) vulnerabilities have long been the bane of C and C++ code and every refactoring idea below attempts to reduce or mitigate many memory corruption issues.

Refactoring Idea #1 - Recompile and Relink

It really could not be simpler. The two main C/C++ toolsets in use today, Microsoft Visual C++ and Gnu gcc, add memory corruption defenses to the compiled and linked code. All you need to do is flip a few compiler and linker flags and the tools will add memory corruption defenses to resulting binary.

For Visual C++ the compiler flags are:

/GS <More Info>

/guard:cf <More Info>

And the linker flags are:

/dynamicbase <More Info>
/nxcompat <More Info>
/safeseh <More Info>

The really good news is that for Visual C++ 2015, you don’t need to do anything other than recompiling and linking the code as these switches are enabled by default. With the exception of the /guard flag; that's new in VC++ 2015 and must be set.

Also, add this to a commonly used header, such as stdafx.h

#pragma strict_gs_check(on)

For gcc, you should flip the following switches in the compiler and linker:

CFLAGS="-fPIE -fstack-protector-all"

LDFLAGS="-Wl,-z,now -Wl,-z,relro"

There is a “downside” to these changes – if there are memory corruption vulnerabilities in the code, there’s a good chance your code will fail if an issue is found during test or when the code is in production. I see this as a good thing because you just found a real security vulnerability with a nice, clean stack trace. Fix the code, the bug has probably been latent for decades and you never knew it.

Refactoring Idea #2 – Replace Insecure C Runtime Functions

There are many C runtime functions that we know are insecure because they don’t constrain how much memory is copied. At Microsoft we banned the use of these functions in new code. The Rogue’s Gallery includes:

strcpy
strcat
sprintf
strncpy
strncat
snprintf
gets

And many, many more. You are wrong if you think I am going to suggest you dive into the code and manually replace these functions with safer versions. A better option, because we’re trying to keep the work as small as possible, is to have the compiler do the work for you when it can. I wrote about this many years ago. All you need to do is add this to a commonly used header file, such as stdafx.h, and then recompile the code:

#define _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES 1

#define _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES_MEMORY 1

For gcc, you can use these two settings which do a similar thing.

CFLAGS=" -D_FORTIFY_SOURCE=2 -Wformat"

Refactoring Idea #3 – Focused use of Static Analysis Tools

If you use Visual C++, compile with -analyze and fix any issue that relates to memory safety. We have tuned the tool to find issues with a high degree of confidence, so there should be few false positives.

In my opinion, no code should be check-in with these warnings:

C6001, C6002, C6029, C6054, C6059, C6063, C6064, C6066, C6067, C6101, C6200, C6201, C6255, C6320, C6383, C6385, C6386, C6411, C6412

John Carmack has some interesting things to say about the value of using /analyze.

Refactoring Idea #4 – Stretch a Little

This one is a little bit of work, as you’ll get build failures until you fix all the issues, but if you add this to a common header, it will deprecate the banned C runtime functions. After you have downloaded the header, the line to add is:

#include <banned.h>

If that’s a little too harsh, then at least fix all C4996 warnings; these are warnings that indicate banned functionality and the set is smaller than the list in banned.h.

Summary

The security purists out there are probably saying there is a lot more to do to old legacy C and C++ code than what I outlined and the purists are totally correct. But I would much rather see large swaths of C and C++ code be made somewhat more secure rather than having 0% of the C and C++ out there have nothing done to it. For those that want to go beyond this list, feel free to do so!