Optimizing C++ Code : Constant-Folding

If you have arrived in the middle of this blog series, you might want instead to begin at the beginning.

This post examines Constant-Folding – one of the simplest optimizations performed by the VC++ compiler.  In this optimization, the compiler works out the result of an expression while it is compiling (at “compile-time”), and inserts the answer directly into the generated code.  This avoids the cost of performing those same calculations when the program runs (at “runtime”).

Here is an example.  A tiny main function, stored in the file App.cpp:

int main() { return 7 + 8; }

First, some admin about this blog: 

  • We will build programs from the command-line (rather than from within Visual Studio)
  • We will use Visual Studio 2012.  In particular, the version of the compiler that generates x64 code (rather than code for the aging x86 architecture), and that compiles on an x64 computer (i.e., “x64 on x64”)

If you want to follow along, please follow the instructions found here:  essentially, you just need to choose the right variant from a list of possible “Visual Studio Tools”.

(Note that if you are using the free compiler from Visual Studio Express, it runs only on x86, but will happily generate code for x64: “x64 on x86”.  This is equally good for experiments)

We can build our sample program with the command:  CL /FA App.cpp.  The /FA switch creates an output file holding the assembly code that the compiler generates for us.  You can display it with: type App.asm to show:

 

PUBLIC  main
_TEXT   SEGMENT
main    PROC
        mov     eax, 15
        ret     0
main    ENDP
_TEXT   ENDS
END

The point of interest is the mov eax, 15 instruction – it just insert the value 15 into the EAX register (which, by definition of the x64 calling standard, is the way that an x64 function sets the int value it will return, as the function’s result, to its caller).  The compiler does not emit instructions that would add 7 to 8 at runtime.  They would have gone something like:

PUBLIC  main
_TEXT   SEGMENT
main    PROC
        mov     eax, 7
        add     eax, 8
        ret     0
main    ENDP
_TEXT   ENDS
END

(Note carefully, the last instruction, in both snippets, ret 0.  This means: return control to the caller, and pop zero bytes from the stack.  Do not be misled into thinking this means return the value 0 to the caller!)

Let me guess: you are probably thinking “this is all very well, but really!  What kind of idiot would ever write code that includes arithmetic like 7 + 8”?  Of course, you are right.  But the compiler does see such constructs, often as a side-effect of macros.  Here is an example to persuade you that Constant-Folding is a worthwhile optimization:

#define SECS_PER_MINUTE  60
#define MINUTES_PER_HOUR 60
#define HOURS_PER_DAY    24

enum Event { Started, Stopped, LostData, ParityError };

struct {
    int        clock_time;
    enum Event ev;
    char*      reason;
}   Record;

int main() {
    const int table_size = SECS_PER_MINUTE * MINUTES_PER_HOUR * HOURS_PER_DAY * sizeof Record;
    // rest of program
}

Here we are going to create a table big enough to hold a Record for each second of an entire day.  So table_size would be the size, in bytes, of that table.  It’s easy to check the assembly instruction that sets the variable table_size :

        mov     DWORD PTR table_size$[rsp], 1382400     ; 00151800H

No multiply instructions here! – it’s all calculated at compile-time: 60 * 60 * 24 * 16 = 1382400.

In fact, if we could peek inside the compiler, we would find that this level of Constant-Folding is so simple, it’s performed by the FrontEnd.  It does not require the heavy lifting power of the BackEnd Optimizer.  And therefore, it’s always on.  It makes no difference whether you request optimization (with /O2) or disable optimization (with /Od) – this optimization always cuts in.

How complicated can the expression be, and yet we still fold those constants together at compile-time?  In fact, the FrontEnd will cope with pretty much any arithmetic expression involving constants (even values such as sizeof, as above, so long as they can be evaluated at compile-time) and operators + - * / % << >> ++ and .  You can even throw in bools, logical operators, ifs and ?:  

Are there any cases of Constant-Folding, then, that do require the power of the BackEnd Optimizer?  Yes.  Consider this example:

int bump(int n) { return n + 1; }

int main() { return 3 + bump(6); }

With the commands, cl /FA /Od App.cpp which says “no optimizations, thank you” and type App.asm, we get:

mov     ecx, 6
call    ?bump@@YAHH@Z                           ; bump
add     eax, 3

Just as we would expect: load 6 into ECX – which holds the first argument, in the x64 calling convention, to our function bump.  Then call bump.  Its result is returned in EAX.  Finally, add 3 into EAX.

Let’s see what happens if we request optimization, with: cl /FA /O2 App.cpp

mov     eax, 10

Here the BackEnd Optimizer has recognized that the bump function is so small that its body should simply be included into its caller (a sophisticated optimization called “function inlining” that we will examine later in the series).  It then realizes it can evaluated the entire expression at compile time, and so ends up with a single instruction.  Quite impressive, right?