Auto-Vectorizer in Visual Studio 2012 – Did It Work?

Article
05/22/2012

If you’ve not read previous posts in this series about auto-vectorization, you may want to begin at the beginning.

This post will explain how to find out which loops in your C++ program were auto-vectorized. Here is an example program, stored in a file called Source.cpp, with which to experiment:

int main() {
const int N = 50; // array dimensions
int a[N], b[N], c[N];
for (int n = 0; n < N; ++n) a[n] = b[n] * c[n];
}

To keep things simple, I have missed out code to initialize the arrays b or c. However, that doesn’t matter for the purpose of this post.

Running from the Command Line

Let’s start by running this program from the command-line (we’ll explain what to do in the Visual Studio IDE in a few minutes).

cl /c /O2 /Qvec-report:1 Source.cpp

This command tells the compiler to compile Source.cpp, but not to go on and link (that’s the /c switch). The /O2 switch tells the compiler to generate code that is optimized for speed. This is crucial: the auto-vectorizer kicks in only when you enable optimization. Finally, the /Qvec-report:1 switch tells the compiler to report which loops were successfully vectorized. (Remember that these command-line switches are case-sensitive: so spell them as shown). And here is the output:

source.cpp

--- Analyzing function: main
c:\source.cpp(4) : loop vectorized

This confirms that the loop on line 4 of Source.cpp, was indeed vectorized.

Please note that the /Qvec-report:1 switch is not present in the Beta drop of VS 11 from February. But it will be included into the next public drop, available soon.

The compiler also provides a /Qvec-report:2 switch. This one tells you which loops were successfully auto-vectorized, and which were not, with a reason code. Here is another snippet that includes a second loop (on line 5):

int main() {
const int N = 50; // array dimensions
int a[N], b[N], c[N];
for (int n = 0; n < N; ++n) a[n] = b[n] * c[n];
for (int n = 0; n < N; ++n) a[n] = a[n-1] + 7;
}

And here is the corresponding report:

source.cpp

--- Analyzing function: main
c:\source.cpp(4) : loop vectorized
c:\source.cpp(5) : loop not vectorized due to reason '1200'

As you can see, the compiler auto-vectorized the loop on line 4 (as before), but failed to auto-vectorize the one on line 5, with a reason code of 1200. This loop is similar to Example 6 – Backward Dependency that we analyzed in a previous post. Vectorizing this loop would produce wrong results, and the auto-vectorizer is smart enough to know this.

Before going on to explain the various reason codes, let’s catch up and explain how to request these results from the Visual Studio IDE.

Running from the IDE

For your project, select the “Release” (rather than “Debug”) configuration. (You can check the project properties to confirm that, under the covers, this sets the /O2 switch, just as we did above from the command-line).

In addition, navigate yourself to “Property Pages”, “Configuration Properties”, “C/C++”, “Command Line”, “Additional Options” and add: /Qvec-report:1. Here’s a screen-shot:

The build shown in the screenshot is for x64, but you can equally well choose x86. Now, whenever you build the project, the output will include a report saying which loops were successfully vectorized. As in the case of requesting this report via the command-line, please note that the /Qvec-report:1 switch is not present in the Beta drop of VS11 from February. But it will be included into the next public drop of VS11, available soon.

Reasons why Vectorization Was Not Possible

Recall that the auto-vectorizer is 100% safe: it will NEVER vectorize a loop if there is the slightest chance the generated code would produce wrong answers – answers different from that implied by the original sequential C++ code.

[NitPick again: what exactly are the answers implied by the sequential execution of a C++ program? Answer: this is a deep question. For our tiny examples, we will simply assume the answer is “obvious”. For the general problem, try a web search for the topic “programming language semantics”]

Ensuring safety requires some pretty deep analysis of the input code. It turns out that sometimes a loop would actually be safe to vectorize, but the analysis cannot prove it so. The auto-vectorizer therefore refuses to vectorize that loop. We say that its judgments are “conservative”.

The warnings from a /Qvec-report:2 run specify any of about 30 reason codes for why a given loop was not vectorized.

The reason codes are discovered and emitted from several layers deep within the compiler. This can sometimes make it difficult to relate the specific issue back to the original C++ code, several layers above. For example, the report may be produced from a loop in a function whose body has been in-lined into its caller – so the original function, at this point in the analysis, no longer exists! Bear this in mind as you read the explanations below for each reason code. We will publish a fuller explanation, with examples, as part of MSDN documentation – this will guide you on tweaking your code so that it vectorizes.

Reason Code	Explanation
500	This is a generic message – it covers several cases: for example, the loop includes multiple exits, or the loop header does not end by incrementing the induction variable
501	Induction variable is not local; or upper bound is not loop-invariant
502	Induction variable is stepped in some manner other than a simple +1
503	Loop includes Exception-Handling or switch statements

Reason Code	Explanation
1100	Loop contains control flow – if, ?:
1101	Loop contains a non-vectorizable conversion operation (may be implicit)
1102	Loop contains non-arithmetic, or other non-vectorizable operations
1103	Loop body includes shift operations whose size might vary within the loop
1104	Loop body includes scalar variables
1105	Loop includes a non-recognized reduction operation
1106	Inner loop already vectorized: cannot also vectorize outer loop

Reason Code	Explanation
1200	Loop contains loop-carried data dependences
1201	Array base changes during the loop
1202	Field within a struct is not 32 or 64 bits wide
1203	Loop body includes non-contiguous accesses into an array

Reason code 1200 says the loop contains loop-carried data dependences which prevent vectorization. This means that different iterations of the loop interfere with each other in such a way that vectorizing the loop would produce wrong answers. More precisely, the auto-vectorizer cannot prove to itself that there are no such data-dependences.

[NitPick asks: what is this “Data Dependence” thing you keep dragging into the conversation? Answer: it lies at the heart of vectorization safety, and uses some interesting math – affine transformations and systems of Diophantine equations. However, no-one commented last time that they wanted more details, so I’ll skip explanations]

Reason Code	Explanation
1300	Loop body contains no (or very little) computation
1301	Loop stride is not +1
1302	Loop is a “do-while”
1303	Too few loop iterations for vectorization to be a win
1304	Loop includes assignments that are of different size
1305	Not enough type information

Reason Code	Explanation
1400	User specified #pragma loop(no_vector)
1401	/kernel switch specified
1402	/arch:IA32 switch specified
1403	/favor:ATOM switch specified and loop includes operations on doubles
1404	/O1 or /Os switch specified

The 1400s reason codes are straightforward – you specified some option that is just plain incompatible with vectorization.

Reason Code	Explanation
1500	Possible aliasing on multi-dimensional arrays
1501	Possible aliasing on arrays-of-structs
1502	Possible aliasing and array index is other than n + K
1503	Possible aliasing and array index has multiple offsets
1504	Possible aliasing – would require too many runtime checks
1505	Possible aliasing – but runtime checks are too complex

The 1500s reason codes are all about aliasing – where a location in memory can be accessed by two different names.

Finally, note that the reason codes listed above apply to this first release of the auto-vectorizer. Subsequent releases will likely stop emitting many of these warnings, as we make the compiler ‘smarter’ at recognizing more and more loop patterns.

The topic of aliasing cropped up earlier. The time seems ripe to explain this term – what it means; why it’s a nuisance; how the auto-vectorizer deals with it. Although the alias analysis performed by a compiler is complex, we can explain the nub of the problem, via examples, in just a few paragraphs. Let’s aim to do that in the next post.

Auto-Vectorizer in Visual Studio 2012 – Did It Work?

Running from the Command Line

Running from the IDE

Reasons why Vectorization Was Not Possible

Additional resources