What is DMA (Part 3) – DMA Translation & Map Registers

Previously in this sequence I talked some about what DMA is, and some of the common models for programming DMA on a device.

Like most code, your driver usually deals with virtual addresses for data buffers.  Your DMA engine (be it slave or bus-mastering) is on the other side of the MMU and so can’t
virtual addresses very well.  You might think you should grab the physical
address of your buffer and program that onto the device instead, but that’s also
going to cause problems.  The simplest example is a 32-bit PCI card on a
64-bit system – this controller cannot handle a physical address above 4GB but
nothing stops an app from giving you buffers in this range.  Clearly you
need to be ready to do some translation1.

WDM provides a mechanism for doing this
translation – the DMA_ADAPTER object.  To get one of these for your device,
you would call IoGetDmaAdapterObject.  This takes a description of the DMA
capabilities of your device & information about your maximum transfer size and
returns to you a pointer to the DMA_ADAPTER which in turn contains pointers to
the other DMA functions you can call.  The maximum transfer size is
expressed in terms of the number of "Map Registers" that you want to allocate.

Map Registers

Map registers are an abstraction the DMA API
uses to track the system resources needed to make one page of memory accessible
by your device for a DMA transfer.  They may represent a bounce buffer – a
single page of memory which the device can access that the DMA will use to
double-buffer part of your transfer.  They could (in the world of the
future) represent entries in a page map that maps pages in the physical address
space into the device’s logical address space (another DDK term).  Or in
the case of a 32-bit adapter on a 32-bit system where there’s no need for
translation, it might represent absolutely nothing at all.  However since
you probably want to write a driver that makes your device work on any Windows
system, you should ignore this last case and focus on the ones where translation
is needed.

You’ll want to allocate enough map registers to
handle your maximum transfer size.  This limit might be exposed by your
hardware, or as a tunable parameter in the registry, or just by common sense
(you probably don’t need to transfer 1GB in a single shot now do you?). 
However since map registers can be a limited resource, you may not always get
the number you asked for (it’s an in/out parameter to IoGetDmaAdapter).  In
that case you’ll need to cut down your maximum transfer size – either rejecting
larger transfers or breaking them up into smaller pieces and staging them.

So lets say your device can handle a transfer
up to 64KB.  You ask for 16 map registers, right?  Not necessarily –
it depends on what alignment you need for the DMA.  If you can handle
buffers with byte alignment then 16 won’t quite cut it – a 64KB transfer that’s
not page aligned will span 17 pages instead of 16.  This will ensure you
can map the entire transfer.

The DMA API keeps track of how many of your map
registers you’re using at any given time.  The functions you call to
allocate enough map registers for a DMA translation (AllocateAdapterChannel,
GetScatterGatherList & BuildScatterGatherList) keep track of how many in use and
call you back when there are sufficient resources available for the operation. 
In the ideal case (where no translation is needed), you’ll be called back
immediately.  In the degenerate case where everything requires translation
you may only be processing one request at a time.  However the nice part is
that your driver can behave the same regardless of which situation you’re in.

1-There are other conditions that can cause
this.  If you have a controller which used DAC to get at all 64-bits of
memory hooked onto a bridge that doesn’t properly support DAC (these do exist)
your card might be in 32-bit mode anyway.  Some day we may be able to block
devices from transferring to or from main memory unless the OS has granted them
access, and that will probably require some translation as well.  There are
a few exceptions to this, but for the most part just accept that you’ll have to
do some translation.

Comments (3)

  1. Darax says:

    You failed to put this in the right category.

  2. PeterWieland says:

    Thanks – i didn’t notice that.  It’s fixed now.

  3. Hi Peter,

    You’ve got great DMA stuff in here. I appreciate your taking the time to write and present it. It augments MSDN nicely…

    My question concerns safely translating an arbitrary kernel buffer (address and length) to an MDL suitable for passing to GetScatterGatherList()

    Take MS-USBD for instance: a data buffer can be submitted to USBD by MDL or transfer buffer. (See MSDN:


    I’m supporting the same "MDL or Buffer" methods for my kernel API with a PLX DMAC under the hood.

    I’ve got both MDL and Buffer methods transferring with PLX DMA, but I lack confidence for properly preparing an MDL when the caller passes a Buffer (instead of an MDL) to my API.

    I translate the caller’s Buffer to an MDL with IoAllocateMdl(). Then, if I have ‘inside knowledge’ that the caller’s buffer is from non-paged pool I call MmBuildMdlForNonPagedPool() and it works great. But there are other buffer types, and I’m not sure of my responsibilities for ensuring correct and safe MDL operations for all of them.

    My solution must safely translate *any* kernel buffer to an MDL ready to pass to GetScatterGatherList().

    Do you know of boilerplate, bulletproof code for creating a "necessary and sufficient" MDL from any kernel buffer?