I chatted in the past about how audio device alignment requirements impact the buffer size and the WASAPI alignment dance.
There are three alignment requirements on audio buffers:
- The buffer size must be a multiple of WAVEFORMATEX.nBlockAlign. This allows individual audio frames to be copied around without worrying about them being cut in half and then having to glue them together at the end.
- KSPROPERTY_RTAUDIO_BUFFER must be a multiple of a page - that is, 4096 bytes. This allows multiply mapping the buffer into consecutive pages, which in turn simplifies memory copies where the buffer is the source or the destination. KSPROPERTY_RTAUDIO_BUFFER is for timer-driven streaming; there is an event-driven analog, KSPROPERTY_RTAUDIO_BUFFER_WITH_NOTIFICATION, which has no corresponding alignment requirement.
- HD Audio buffer allocations must be a multiple of 256 bytes. For timer-driven buffers, this applies to the whole buffer. For event-driven buffers, this applies to the sum of the "ping" and "pong" buffers, so the individual "ping" or "pong" buffer must be a multiple of 128 bytes.
Consider a 5.1 16-bit 48 kHz stream playing to HD Audio hardware via KSPROPERTY_RTAUDIO_BUFFER. Where multiple alignment requirements apply, the effective alignment requirement is the least common multiple of all the applicable requirements.
From the nBlockAlign requirement, the buffer must be a multiple of (6 * 16) / 8 bytes = 12 bytes.
From the KSPROPERTY_RTAUDIO_BUFFER requirement, the buffer must be a multiple of PAGE_SIZE = 4096 bytes.
From the HD Audio requirement, the buffer must be a multiple of 256 bytes (this is timer-driven, so we do not divide by 2.)
In all, then, the buffer must be a multiple of LCM(12, 4096, 256) = 12288 bytes.
Since WAVEFORMATEX.nAvgBytesPerSec = ((6 * 16) / 8) * 48000 = 576000 byte/sec, this corresponds to 12288 / 576000 * 1000 = 21.333 milliseconds.