DirectXMath: AVX2


The Advanced Vector Extensions 2 (AVX2) rounds out the instruction set introduced with AVX. The majority of the new instructions are for 256-bit registers, so they aren't directly applicable to DirectXMath. AVX2 is very useful if trying to make a fully equivalent double4 version of all the DirectXMath functionality which is otherwise focused on float4 vectors, but that is beyond the scope of this article or the library generally.

The immediate value of targeting AVX2 is that you can make use of the AVX, FMA3, and F16C optimizations already covered on the blog as all of those are included.

There is one more simple substitution for DirectXMath when using AVX2 which is also a special case for XMVectorSwizzle<0,0,0,0>

inline XMVECTOR XM_CALLCONV XMVectorSplatX( FXMVECTOR V )
{
 return _mm_broadcastss_ps( V );
}

Processor Support

AVX2 is supported by Intel “Haswell”, AMD Excavator, and later processors.

In addition to the hardware supporting the new instruction set, the OS must support saving the new YMM register file or the AVX instructions will remain invalid. This support is included in Windows 7 Service Pack 1, Windows Server 2008 R2 Service Pack 1, Windows 8, and Windows Server 2012. This support is indicated by the OSXSAVE bit in CPUID being set along with the AVX2 support bit.

 int CPUInfo[4] = {-1};
 __cpuid( CPUInfo, 0 );
 bool bAVX2 = false;
 if ( CPUInfo[0] >= 7 )
 {
 __cpuid(CPUInfo, 1 );
 bool bOSXSAVE = (CPUInfo[2] & 0x8000000) != 0;
 __cpuidex( CPUInfo, 7, 0 );
 bAVX2 = bOSXSAVE && (CPUInfo[1] & 0x20) != 0;
 }
 

Compiler Support

Support for AVX2 intrinsics was added to Visual Studio 2012. The /arch:AVX2 switch is supported by VS 2012 Update 2, although IDE support wasn't added until VS 2013.

Note that with this switch, the compiler will optimize code to make use of FMA3 automatically where applicable.

Utility Code

The source for this project and the rest of the blog series is now available on GitHub under the MIT license.

Xbox One: This platform does not support AVX2.

See also: SSE. SSE2. and ARM-NEON; SSE3 and SSSE3; SSE4.1 and SSE4.2; AVXF16C and FMA

Comments (0)

Skip to main content