Auto-Vectorizer in Visual Studio 2012 – Did It Work?
If you’ve not read previous posts in this series about auto-vectorization, you may want to begin at the beginning.
This post will explain how to find out which loops in your C++ program were auto-vectorized. Here is an example program, stored in a file called Source.cpp, with which to experiment:
- int main() {
- const int N = 50; // array dimensions
- int a[N], b[N], c[N];
- for (int n = 0; n < N; ++n) a[n] = b[n] * c[n];
- }
To keep things simple, I have missed out code to initialize the arrays b or c. However, that doesn’t matter for the purpose of this post.
Running from the Command Line
Let’s start by running this program from the command-line (we’ll explain what to do in the Visual Studio IDE in a few minutes).
cl /c /O2 /Qvec-report:1 Source.cpp
This command tells the compiler to compile Source.cpp, but not to go on and link (that’s the /c switch). The /O2 switch tells the compiler to generate code that is optimized for speed. This is crucial: the auto-vectorizer kicks in only when you enable optimization. Finally, the /Qvec-report:1 switch tells the compiler to report which loops were successfully vectorized. (Remember that these command-line switches are case-sensitive: so spell them as shown). And here is the output:
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50520 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
source.cpp
--- Analyzing function: main
c:\source.cpp(4) : loop vectorized
This confirms that the loop on line 4 of Source.cpp, was indeed vectorized.
Please note that the /Qvec-report:1 switch is not present in the Beta drop of VS 11 from February. But it will be included into the next public drop, available soon.
The compiler also provides a /Qvec-report:2 switch. This one tells you which loops were successfully auto-vectorized, and which were not, with a reason code. Here is another snippet that includes a second loop (on line 5):
- int main() {
- const int N = 50; // array dimensions
- int a[N], b[N], c[N];
- for (int n = 0; n < N; ++n) a[n] = b[n] * c[n];
- for (int n = 0; n < N; ++n) a[n] = a[n-1] + 7;
- }
And here is the corresponding report:
Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50520 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
source.cpp
--- Analyzing function: main
c:\source.cpp(4) : loop vectorized
c:\source.cpp(5) : loop not vectorized due to reason '1200'
As you can see, the compiler auto-vectorized the loop on line 4 (as before), but failed to auto-vectorize the one on line 5, with a reason code of 1200. This loop is similar to Example 6 – Backward Dependency that we analyzed in a previous post. Vectorizing this loop would produce wrong results, and the auto-vectorizer is smart enough to know this.
Before going on to explain the various reason codes, let’s catch up and explain how to request these results from the Visual Studio IDE.
Running from the IDE
For your project, select the “Release” (rather than “Debug”) configuration. (You can check the project properties to confirm that, under the covers, this sets the /O2 switch, just as we did above from the command-line).
In addition, navigate yourself to “Property Pages”, “Configuration Properties”, “C/C++”, “Command Line”, “Additional Options” and add: /Qvec-report:1. Here’s a screen-shot:
The build shown in the screenshot is for x64, but you can equally well choose x86. Now, whenever you build the project, the output will include a report saying which loops were successfully vectorized. As in the case of requesting this report via the command-line, please note that the /Qvec-report:1 switch is not present in the Beta drop of VS11 from February. But it will be included into the next public drop of VS11, available soon.
Reasons why Vectorization Was Not Possible
Recall that the auto-vectorizer is 100% safe: it will NEVER vectorize a loop if there is the slightest chance the generated code would produce wrong answers – answers different from that implied by the original sequential C++ code.
[NitPick again: what exactly are the answers implied by the sequential execution of a C++ program? Answer: this is a deep question. For our tiny examples, we will simply assume the answer is “obvious”. For the general problem, try a web search for the topic “programming language semantics”]
Ensuring safety requires some pretty deep analysis of the input code. It turns out that sometimes a loop would actually be safe to vectorize, but the analysis cannot prove it so. The auto-vectorizer therefore refuses to vectorize that loop. We say that its judgments are “conservative”.
The warnings from a /Qvec-report:2 run specify any of about 30 reason codes for why a given loop was not vectorized.
The reason codes are discovered and emitted from several layers deep within the compiler. This can sometimes make it difficult to relate the specific issue back to the original C++ code, several layers above. For example, the report may be produced from a loop in a function whose body has been in-lined into its caller – so the original function, at this point in the analysis, no longer exists! Bear this in mind as you read the explanations below for each reason code. We will publish a fuller explanation, with examples, as part of MSDN documentation – this will guide you on tweaking your code so that it vectorizes.
Reason Code |
Explanation |
500 |
This is a generic message – it covers several cases: for example, the loop includes multiple exits, or the loop header does not end by incrementing the induction variable |
501 |
Induction variable is not local; or upper bound is not loop-invariant |
502 |
Induction variable is stepped in some manner other than a simple +1 |
503 |
Loop includes Exception-Handling or switch statements |
Reason Code |
Explanation |
1100 |
Loop contains control flow – if, ?: |
1101 |
Loop contains a non-vectorizable conversion operation (may be implicit) |
1102 |
Loop contains non-arithmetic, or other non-vectorizable operations |
1103 |
Loop body includes shift operations whose size might vary within the loop |
1104 |
Loop body includes scalar variables |
1105 |
Loop includes a non-recognized reduction operation |
1106 |
Inner loop already vectorized: cannot also vectorize outer loop |
Reason Code |
Explanation |
1200 |
Loop contains loop-carried data dependences |
1201 |
Array base changes during the loop |
1202 |
Field within a struct is not 32 or 64 bits wide |
1203 |
Loop body includes non-contiguous accesses into an array |
Reason code 1200 says the loop contains loop-carried data dependences which prevent vectorization. This means that different iterations of the loop interfere with each other in such a way that vectorizing the loop would produce wrong answers. More precisely, the auto-vectorizer cannot prove to itself that there are no such data-dependences.
[NitPick asks: what is this “Data Dependence” thing you keep dragging into the conversation? Answer: it lies at the heart of vectorization safety, and uses some interesting math – affine transformations and systems of Diophantine equations. However, no-one commented last time that they wanted more details, so I’ll skip explanations]
Reason Code |
Explanation |
1300 |
Loop body contains no (or very little) computation |
1301 |
Loop stride is not +1 |
1302 |
Loop is a “do-while” |
1303 |
Too few loop iterations for vectorization to be a win |
1304 |
Loop includes assignments that are of different size |
1305 |
Not enough type information |
Reason Code |
Explanation |
1400 |
User specified #pragma loop(no_vector) |
1401 |
/kernel switch specified |
1402 |
/arch:IA32 switch specified |
1403 |
/favor:ATOM switch specified and loop includes operations on doubles |
1404 |
/O1 or /Os switch specified |
The 1400s reason codes are straightforward – you specified some option that is just plain incompatible with vectorization.
Reason Code |
Explanation |
1500 |
Possible aliasing on multi-dimensional arrays |
1501 |
Possible aliasing on arrays-of-structs |
1502 |
Possible aliasing and array index is other than n + K |
1503 |
Possible aliasing and array index has multiple offsets |
1504 |
Possible aliasing – would require too many runtime checks |
1505 |
Possible aliasing – but runtime checks are too complex |
The 1500s reason codes are all about aliasing – where a location in memory can be accessed by two different names.
Finally, note that the reason codes listed above apply to this first release of the auto-vectorizer. Subsequent releases will likely stop emitting many of these warnings, as we make the compiler ‘smarter’ at recognizing more and more loop patterns.
The topic of aliasing cropped up earlier. The time seems ripe to explain this term – what it means; why it’s a nuisance; how the auto-vectorizer deals with it. Although the alias analysis performed by a compiler is complex, we can explain the nub of the problem, via examples, in just a few paragraphs. Let’s aim to do that in the next post.