Custom Memory Allocation in dxcompiler

This post describes the implementation of the custom memory allocator in dxcompiler. At some point, the information here will likely make it into the repo itself along with other design notes.

Motivation

The DirectX Shader Compiler is mostly meant for offline usage, that is, compiling during the build process and not while the game or application is running (I'm going to use 'game' for the rest of this post because games tend to have some of the most demanding use cases, but everything is applicable to other kinds of programs).

That said, there are some scenarios today where compilation happens online, ie. when the game is running. For example, you may be running a design version of the game that an artist can modify in real-time, or you may be writing an extensible framework where you can't anticipate all the shaders you might need to compile. Or you might be supporting some sort of scripting for game mods.

When compilation needs to occur while the game is running, it's important that the compiler not interfere with the rest of the game execution. Games that carefully design their memory usage often need to control where allocations are made to avoid fragmentation or to fit with a given partitioning scheme.

To satisfy these requirements, we've modified dxcompiler to support custom memory allocators to allocate and free memory on behalf of an application. You don't need to supply one, but the compiler is ready to use one if provided. Let's dive in.

Usage

Typically, to create a compiler object, you make a call to DxcCreateInstance.

When you want to provide your own allocator, you can instead make a call to DxcCreateInstance2, and provide an implementation of IMalloc. IMalloc is a COM-style interface that allocates and frees memory. It's simple to implement and like other COM interfaces, allows the lifetime of the object to be controlled.

The object you requested from DxcCreateInstance2 will be allocated from this allocator and hold a reference to it. Any methods you invoke on your compiler object will use this allocator as well, and any output parameters will be allocated and hold onto this allocator, too. When you release these objects, you're free to clean up the allocator.

Design

This section includes some internal implementation notes that are useful for people working on the compiler itself.

IMalloc implementation

There are a number of IMalloc implementations. One is provided by COM via the CoGetMalloc function  It's very easy to build one on the heap functions as well.

Explicit allocators

Whenever possible, it's preferable to be explicit about which IMalloc is being used. Typically these get passed around as arguments, but there are also two important cases: top-level objects (those created by DxcCreateInstance2) which need to store it for further activity, and objects that outlive top-level calls (typically blobs or result objects) that need to hold on to the allocator beyond the lifetime of the call that creates them.

Implicit allocators

The compiler is based on clang and LLVM, which don't support custom allocation per se, but instead rely on malloc/free/realloc and operators new and delete. Rather than modifying every bit of code to pass allocators around, we store the active IMalloc in thread-local storage, and provide an implementation of the memory management functions that use it.

Exception handling

IMalloc can fail to allocate memory. clang and LLVM are designed more for console applications where the compiler owns in part or in whole the process under which it runs, and so it lets the operating system reclaim resources as needed. dxcompiler on the other hand is meant to be a library that can be loaded into any process for various scenarios, and so it should handle exceptions carefully, releasing allocated memory and references taken, and properly returning an error code.

Jon Kalb's website at https://exceptionsafecode.com/ is an excellent resource for handle errors in C++ code.

Implementation

Macros for COM-like objects

In microcom.h you will find the following macros and helper functions. Note that 'TM' is used to refer to the threadlocal malloc mechanism.

  • DXC_MICROCOM_TM_REF_FIELDS: replacement for DXC_MICROCOM_REF_FIELDS, includes a reference count and an owning m_pMalloc.
  • DXC_MICROCOM_TM_ADDREF_RELEASE_IMPL: replacement for DXC_MICROCOM_ADDREF_RELEASE_IMPL, includes deallocating with the owning m_pMalloc and setting it up as the current threadlocal allocator when releasing the object.
  • DXC_MICROCOM_TM_CTOR: defines an empty constructor and a helper static Alloc() that will take the owner IMalloc and set it up properly.

If you need arguments passed into the object, the inline CreateOnMalloc function can be used instead of the empty constructor; note that the allocator isn't assigned to the object in that case.

Thread-local memory management

Most of the declarations to support these can be found in the Global.h (yes, there's a Global.h file - don't ask). There are functions to do library initialization and cleanup, and hooking and cleaning up a threadlocal allocator.

Much of the per-call management is encapsulated in the DxcThreadMalloc RAII object, which can be declared on the stack to set the scope for a given allocator.

To actually opt into the threadlocal management, the DLL needs to both initialize and cleanup the mechanism, as well as make sure that new/delete and others are redirected properly. We don't include this in any of the libraries we build, to make sure it's a clear opt-in decision for targets.

Beware global state

Globals that get initialized on-demand (like many ManagedStatic values) are tricky, because they aren't really associated with the currently-executing allocator. Instead, these should be initialized up-front on DllMain, and be alive through the lifetime of the library.

There are a few more interesting things we can conver, like how we use this as a fault-injection mechanism to make sure recovery is working properly, but there's plenty to chew on here.

Enjoy!