DirectXMath: ARM64

The Visual Studio 2017 (15.9 update) now supports the ARM64 architecture for the Universal Windows Platform (UWP) apps.

The ARM64 platform supports ARM-NEON using the same intrinsics as the ARM (32-bit) platform. The Windows on ARM (32-bit) platform assumes support for ARMv7, ARM-NEON, and VFPv3. The Windows on ARM (64-bit) platform assumes support for ARMv8, ARM-NEON, and VFPv4.


The ARMv8 instruction set implies support for several useful intrinsics for DirectXMath data types:

  • vector divide: vdivq_f32
  • vector rounding: vrndq_f32, vrndnq_f32, vrndmq_f32, vrndpq_f32
  • half-precision conversion: vcvt_f32_f16, vcvt_f16_f32
  • fused-multiply and accumulate: vfmaq_f32, vfmsq_f32

In ARM (32-bit), vector division had to be implemented using multiply-by-reciprocal with 2 or 3 iterations of Newton-Raphson refinement which is less precise. For the ARM64 platform, I was able to replace all uses of divide in non-Est functions with a ‘true divide’ in XMVectorDivide, XMVectorReciprocal, and in the implementation for a number of other functions.

For ARM (32-bit) I used a number of tricks to perform the rounding operations. With the ARM64 platform, I can use the new intrinsics to implement XMVectorRound, XMVectorTruncate, XMVectorFloor, and XMVectorCeiling.

The half-precision conversion intrinsics are used when building for the ARM64 platform to implement XMConvertHalfToFloat, XMConvertFloatToHalf, XMConvertHalfToFloatStream, and XMConvertFloatToHalfStream.


  • DirectXMath 3.07 was the first version to include basic ARM64 support using the same ARM-NEON implementation as used for ARM (32-bit).
  • DirectXMath 3.10 uses ARMv8 intrinsics when building the _M_ARM64 architecture for optimizations of specific functions including the new XMVectorSum horizontal add function.
  • DirectXMath 3.12 uses ARM64 fused-multiply and accumulate to implement XMVectorMultiplyAdd and XMVectorNegativeMultiplySubtract on the ARM64 platform.

See also: SSE. SSE2. and ARM-NEON; SSE3 and SSSE3; SSE4.1 and SSE4.2; AVX; F16C and FMA; AVX2

Comments (1)

  1. ferdo says:

    Any news on C++ AMP on ARM64 devices? Currently with VS 2017 15.9.5 C++ AMP kernels do NOT work on ARM64! However DirectCompute works!

Skip to main content