Introducing DirectXMath

The Windows SDK for Windows 8 (included with Visual Studio 2012) ships with the DirectXMath library which is the next major revision of the C++ SIMD graphics math library known as “XNAMath” in the DirectX SDK and Xbox 360 XDK. Think of it as “XNAMath version 3”—for the historically inclined, XNAMath itself was essentially “xboxmath version 2”

DirectXMath started life as “XNAMath 2.04”. The primary goals were to provide an excellent C++ SIMD math library for use for Windows Store apps (a.k.a. Metro style apps) on Windows 8 including ARM-NEON optimizations for the new Windows on ARM (aka Windows RT) platform, keep all the goodness of XNAMath in the DirectX SDK for existing Windows graphics developers targeting Windows x86 and Windows x64 using SSE2, and generally be easily compatible with existing ‘client code’ using XNAMath. We also wanted to improve usability and more closely match the functionality available for C# developers using XNA Game Studio.

The first major design decision was to drop support for older Visual C++ compilers that did not support C++11 (known as C++0x at the time), and this means Visual C++ 2010 or Visual Studio 2012 (aka Visual Studio 11) is required. This allows removal of all the various “windows.h” specific types (UINT, BYTE, etc.) and using standard C++ types in “stdint.h” (size_t, uint32_t, uint8_t, etc.). The headers were reorganized and use of C++ namespaces introduced to make it easier to use the library, especially in reducing the clutter when using IntelliSense.

The second major decision was to not support the Xbox 360 platform. This was a tough one, but the reality is that xboxmath is deeply entwined with Xbox 360 codebase, particularly the graphics components. XNAMath was able to replace xboxmath, but only by maintaining 100% support for every possible use of xboxmath including numerous older compilers, pure C contexts, etc. xboxmath was also designed to exactly match every quirk of the Xbox 360’s VMX128 intrinsics set which made it difficult to make optimal design decisions for Windows and SSE. In general, it should be possible to make the same code compile with DirectXMath on Windows and XNAMath on Xbox 360 without too many #ifdefs with some judicious use of typedef.

The primary ‘breaking change’ in DirectXMath compared to XNAMath are the parameters for XMVectorPermute which now takes four 0-7 indices instead of a VMX128 __vperm control word (possibly created by XMVectorPermuteControl from 0-7 indices). There are now template forms of XMVectorPermute and XMVectorSwizzle (as well as XMVectorShiftLeft, XMVectorRotateLeft, XMVectorRotateRight, and XMVectorInsert) which can for SSE end up being  executed with a single or just a few intrinsics—the XNAMath version currently has to spill to memory to emulate the VMX128 __vperm instruction. Template specialization is used to provide similar performance for common ARM-NEON ‘swizzles’ and ‘permutes’.

The DirectXMath Programming Guide on MSDN provides full detail on What’s New in the library, and a Code Migration guide for current users of XNAMath. For developers using DirectXMath for the first time, the Getting Started page has been expanded to provide more basic usage information.

The DirectXMath library is all inline (as was XNAMath) and has very few Windows dependencies. In fact the only dependency on ‘windows.h’ is the XMVerifyCPUSupport function which is coded to always return ‘false’ if ‘windows.h’ is not included before “DirectXMath.h” so you don’t need the Windows headers except in the module where you handle startup verification. The library is annotated using SAL2 (rather than the older VS-style or Windows-style annotations), so requires the standalone Windows 8.0 SDK or some additional fixups to use with Visual C++ 2010.

BTW, if you want to know what those SAL2 fixups look like:

#if defined(_MSC_VER) && (_MSC_VER<1610) && !defined(_In_reads_)
 #define _Analysis_assume_(exp) __analysis_assume(exp)
 #define _In_reads_(exp) _In_count_c_(exp)
 #define _Out_writes_(exp) _Out_cap_c_(exp)
 #define _In_reads_bytes_(exp) _In_bytecount_x_(exp)
 #define _Out_writes_bytes_(exp) _Out_bytecap_x_(exp)
 #ifndef _Use_decl_annotations_
 #define _Use_decl_annotations_

Additional math functionality that build on DirectXMath’s capabilities are also availabl

Porting Notes

A detailed list of D3DXMath mappings to DirectXMath is available on MSDN.

Note: The Windows SDK for Windows 8 Consumer Preview included with Visual Studio 11 Beta has DirectXMath version 3.02. Details on the breaking changes since the Developer Preview are covered in the official Migration Guide.

Update: The Windows SDK for Windows 8 Release Preview / RTM included with Visual Studio 2012 has DirectXMath version 3.03. Details on the breaking changes since the Consumer Preview are covered in the official Migration Guide.

Windows phone: The Windows Phone SDK 8.0 includes DirectXMath version 3.03.

VS 2013 Preview: The Windows SDK for Windows 8.1 Preview included with the Visual Studio 2013 Preview includes DirectXMath 3.05.

VS 2013 RC: The Windows 8.1 SDK included with Visual Studio 2013 RC includes DirectXMath 3.06. See What’s New on MSDN for details.

SimpleMath: If you are new to C++ and/or SIMD programming, you might want to take a look at ‘simple math‘ as a starting point.

Samples: DirectXMath Win32 Sample

Note: DirectXMath is now hosted on GitHub.

Related: XNA Math version 2.05, DirectXMath: SSE, SSE2, and ARM-NEON, Known Issues: DirectXMath 3.03, DirectXMath 3.06, DirectXMath 3.07, DirectXMath 3.08, DirectXMath 3.09

Comments (7)

  1. Will there be support for more recent versions of SSE? For instance the dot product in SSE4.1. You'd want to target SSE2 by default I suppose, but clients could opt in with #defines.

  2. The design of DirectXMath/XNAMath is to use it 'everywhere' without having to do codepath protection and runtime CPU feature detection. Doing this kind of selection on a 'micro' function level involves indirections which are pretty inefficent. It makes more sense to do this at a' higher' algorithm level where you'd have say an entire module that works using SSE4 and then anther module for a fall-back.

    While it would be nice for the library to provide the SSE4 version of say DotProduct, the real usefulness would be in everywhere else in the library that calls DotProduct using it. This causes a function combinatorial explosion if you allow developer control over codepath selection

    If you are going to be doing this kind of stuff, you are going to be breaking the 'cross-platform' nature of the code and you can do so in a developer controlled way. In that case, use DirectXMath throughout and then drop down to explicit SSE4 intrinsics (or whatever) where it makes sense for your code and you've done the path protection work. I address this in the Library Internals page

  3. OK, thanks for the link. I was talking about compile-time selection though, not runtime. I don't understand why this would break cross-platform development, or cause an explosion of functions. Surely it's the same as pumping out a different implementation of the routines because you've disabled SSE completely at compile time, say…?

  4. SSE2 makes a reasonable 'min bar' for intrinsics support for Windows PCs. You'd have to go back to an Intel Pentium 3 or AMD K7 to find a processor without SSE2 support.

    Beyond that, it is difficult to find a higher 'near universal' intruction set. Therefore you must have runtime selection and fallbacks, and once you do that it's usually only really useful in very targeted scenarios.

    SSE3, SSSE3, and SSE 4.2 don't have much of use for the kinds of operations in DirectXMath.

    SSE4.1 has some very useful stuff including _mm_dp_ps, _mm_round_ps, _mm_floor_ps, and _mm_ceil_ps which can be used as 'drop-in' replacements for XMVector*Dot, XMVectorRound, XMVectorTruncate, XMVectorFloor, and XMVectorCeil. You can also rewrite a number of the length and normalize functions to use the dot-product as well.

    AVX offers some useful stuff as well: _mm_broadcast_ps,  _mm_permute_ps, and _mm_permutevar_ps. While you can use _mm_broadcast_ps as a 'drop-in' replacement for XMVectorReplicatePtr, you really have to modify just about every function in the library to use _mm_permute_ps where-ever there's an _mm_shuffle_ps() where the two input vectors are the same one.

    Therefore _mm_dp_ps and _mm_permute_ps end up being used in a large number of the functions, which makes using the library pretty difficult if you want to use it throughout yet provide guarded codepaths.

    See SSE2 is supported by 99.75% of their PC users. SSE4.1 is only 48%

  5. Simon Mourier says:

    Hi Chuck

    It's not clear to me if this DirectXMath library (and also XDSP.H wich is really what intereests me) is supported on Windows versions prior to Windows 8? I understand the Windows 8.x SDK is necessary, but can the generated code run on Windows Vista? Windows 7? (not talking about XP…)

  6. @Simon – The DirectXMath, XDSP, and SHMath code all only have a single OS call dependency (noted above). As such, what matter is the compiler support rather than OS version. The DirectXMath code will build with VS 2010, VS 2012, or VS 2013 for x86, x64, and ARM. In theory you could even get it to run on Windows XP SP3 since VS 2010 supports targeting that far down-level, but there's no official header combination that would do that since the Windows 8.x SDK does not support Windows XP SP3.

  7. gainexec says:

    Its breathtaking