Auto-Vectorizer in Visual Studio 2012 – Did It Work?

 

If you’ve not read previous posts in this series about auto-vectorization, you may want to begin at the beginning.

This post will explain how to find out which loops in your C++ program were auto-vectorized.   Here is an example program, stored in a file called Source.cpp, with which to experiment:

  1. int main() {
  2.   const int N = 50;           // array dimensions
  3.   int a[N], b[N], c[N];
  4.   for (int n = 0; n < N; ++n) a[n] = b[n] * c[n];
  5. }

To keep things simple, I have missed out code to initialize the arrays b or c.  However, that doesn’t matter for the purpose of this post.

Running from the Command Line

Let’s start by running this program from the command-line (we’ll explain what to do in the Visual Studio IDE in a few minutes).

cl /c /O2 /Qvec-report:1 Source.cpp

This command tells the compiler to compile Source.cpp, but not to go on and link (that’s the /c switch). The /O2 switch tells the compiler to generate code that is optimized for speed.  This is crucial: the auto-vectorizer kicks in only when you enable optimization.  Finally, the /Qvec-report:1 switch tells the compiler to report which loops were successfully vectorized.  (Remember that these command-line switches are case-sensitive: so spell them as shown).  And here is the output:

Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50520 for x64
Copyright (C) Microsoft Corporation. All rights reserved.

source.cpp

--- Analyzing function: main
c:\source.cpp(4) : loop vectorized

This confirms that the loop on line 4 of Source.cpp, was indeed vectorized.

Please note that the /Qvec-report:1 switch is not present in the Beta drop of VS 11 from February.  But it will be included into the next public drop, available soon.

The compiler also provides a /Qvec-report:2 switch.  This one tells you which loops were successfully auto-vectorized, and which were not, with a reason code.  Here is another snippet that includes a second loop (on line 5):

  1. int main() {
  2.   const int N = 50;           // array dimensions
  3.   int a[N], b[N], c[N];
  4.   for (int n = 0; n < N; ++n) a[n] = b[n] * c[n];
  5.   for (int n = 0; n < N; ++n) a[n] = a[n-1] + 7;
  6. }

And here is the corresponding report:

Microsoft (R) C/C++ Optimizing Compiler Version 17.00.50520 for x64
Copyright (C) Microsoft Corporation. All rights reserved.

source.cpp

--- Analyzing function: main
c:\source.cpp(4) : loop vectorized
c:\source.cpp(5) : loop not vectorized due to reason '1200'

As you can see, the compiler auto-vectorized the loop on line 4 (as before), but failed to auto-vectorize the one on line 5, with a reason code of 1200.  This loop is similar to Example 6 – Backward Dependency that we analyzed in a previous post.  Vectorizing this loop would produce wrong results, and the auto-vectorizer is smart enough to know this.

Before going on to explain the various reason codes, let’s catch up and explain how to request these results from the Visual Studio IDE.

Running from the IDE

For your project, select the “Release” (rather than “Debug”) configuration. (You can check the project properties to confirm that, under the covers, this sets the /O2 switch, just as we did above from the command-line).

In addition, navigate yourself to “Property Pages”, “Configuration Properties”, “C/C++”, “Command Line”, “Additional Options” and add: /Qvec-report:1.  Here’s a screen-shot:

clip_image002

The build shown in the screenshot is for x64, but you can equally well choose x86.  Now, whenever you build the project, the output will include a report saying which loops were successfully vectorized.  As in the case of requesting this report via the command-line, please note that the /Qvec-report:1 switch is not present in the Beta drop of VS11 from February.  But it will be included into the next public drop of VS11, available soon.

Reasons why Vectorization Was Not Possible

Recall that the auto-vectorizer is 100% safe: it will NEVER vectorize a loop if there is the slightest chance the generated code would produce wrong answers – answers different from that implied by the original sequential C++ code.

[NitPick again: what exactly are the answers implied by the sequential execution of a C++ program?  Answer: this is a deep question.  For our tiny examples, we will simply assume the answer is “obvious”.  For the general problem, try a web search for the topic “programming language semantics”]

Ensuring safety requires some pretty deep analysis of the input code.  It turns out that sometimes a loop would actually be safe to vectorize, but the analysis cannot prove it so.  The auto-vectorizer therefore refuses to vectorize that loop.  We say that its judgments are “conservative”.

The warnings from a /Qvec-report:2 run specify any of about 30 reason codes for why a given loop was not vectorized.

The reason codes are discovered and emitted from several layers deep within the compiler.  This can sometimes make it difficult to relate the specific issue back to  the original C++ code, several layers above.  For example, the report may be produced from a loop in a function whose body has been in-lined into its caller – so the original function, at this point in the analysis, no longer exists!  Bear this in mind as you read the explanations below for each reason code.  We will publish a fuller explanation, with examples, as part of MSDN documentation – this will guide you on tweaking your code so that it vectorizes.

 

Reason Code

Explanation

500

This is a generic message – it covers several cases: for example, the loop includes multiple exits, or the loop header does not end by incrementing the induction variable

501

Induction variable is not local; or upper bound is not loop-invariant

502

Induction variable is stepped in some manner other than a simple +1

503

Loop includes Exception-Handling or switch statements

 

Reason Code

Explanation

1100

Loop contains control flow – if, ?:

1101

Loop contains a non-vectorizable conversion operation (may be implicit)

1102

Loop contains non-arithmetic, or other non-vectorizable operations

1103

Loop body includes shift operations whose size might vary within the loop

1104

Loop body includes scalar variables

1105

Loop includes a non-recognized reduction operation

1106

Inner loop already vectorized: cannot also vectorize outer loop

 

Reason Code

Explanation

1200

Loop contains loop-carried data dependences

1201

Array base changes during the loop

1202

Field within a struct is not 32 or 64 bits wide

1203

Loop body includes non-contiguous accesses into an array

Reason code 1200 says the loop contains loop-carried data dependences which prevent vectorization.  This means that different iterations of the loop interfere with each other in such a way that vectorizing the loop would produce wrong answers.  More precisely, the auto-vectorizer cannot prove to itself that there are no such data-dependences. 

[NitPick asks: what is this “Data Dependence” thing you keep dragging into the conversation?  Answer: it lies at the heart of vectorization safety, and uses some interesting math – affine transformations and systems of Diophantine equations.  However, no-one commented last time that they wanted more details, so I’ll skip explanations]

 

Reason Code

Explanation

1300

Loop body contains no (or very little) computation

1301

Loop stride is not +1

1302

Loop is a “do-while”

1303

Too few loop iterations for vectorization to be a win

1304

Loop includes assignments that are of different size

1305

Not enough type information

 

Reason Code

Explanation

1400

User specified #pragma loop(no_vector)

1401

/kernel switch specified

1402

/arch:IA32 switch specified

1403

/favor:ATOM switch specified and loop includes operations on doubles

1404

/O1 or /Os switch specified

The 1400s reason codes are straightforward – you specified some option that is just plain incompatible with vectorization.

 

Reason Code

Explanation

1500

Possible aliasing on multi-dimensional arrays

1501

Possible aliasing on arrays-of-structs

1502

Possible aliasing and array index is other than n + K

1503

Possible aliasing and array index has multiple offsets

1504

Possible aliasing – would require too many runtime checks

1505

Possible aliasing – but runtime checks are too complex

The 1500s reason codes are all about aliasing – where a location in memory can be accessed by two different names.

Finally, note that the reason codes listed above apply to this first release of the auto-vectorizer.  Subsequent releases will likely stop emitting many of these warnings, as we make the compiler ‘smarter’ at recognizing more and more loop patterns.

The topic of aliasing cropped up earlier.  The time seems ripe to explain this term – what it means; why it’s a nuisance; how the auto-vectorizer deals with it. Although the alias analysis performed by a compiler is complex, we can explain the nub of the problem, via examples, in just a few paragraphs.  Let’s aim to do that in the next post.