What is DMA (Part 2) – DMA to a Driver


Yesterday i talked a little about “what DMA is”.  Today i want to talk a little bit about how devices use DMA)


DMA to a Driver



From the driver’s point of view there are two aspects to DMA. The first is how you prepare your data for DMA transfers. The second is how you program the device to initiate the transfers & how you notice that a transfer is done. Let’s talk about the second part first.


There are an infinite number of models for programming your device to start a DMA. Each introduces its own limitations. I’ll go over a few of the common ones i’ve seen:



  1. The device takes a single physical address base and a length for an operation. This is very simple to program, but requres the transfer to be physically contiguous, which is unlikely for anything other than the smallest transfers (physical memory is often very fragmented, so the chance of two adjoining virtual pages using adjoining physical pages is pretty small). The device will usually interrupt when the DMA transfer is complete.
  2. The device takes a single physical address base & a length for each fragment of an operation. It interrupts when it’s done transferring each fragment, allowing your driver to program in the next one. This is going to be slow because of the latency between each fragment, but is still easy to implement.
  3. The device takes a sequence of (physical-address, length) pairs which describe all the fragments of the transfer. This sequence is called a “scatter-gather list” (SG List). The device can then transfer each fragment on its own without the need to interrupt the CPU until all sections are done. In the simplest version of this, the driver programs the SG list to the controller through its registers/ports – writing each element into the device’s internal memory. The device will only have a limited space for the SG list, so you may only be able to handle 16 fragments in a given transfer.
  4. In the more complex version of 3, the SG list itself is stored in DMA accessible system memory and the device is programmed with the physical address and length of the scatter-gather list itself. The device can then use DMA to transfer the SG list entries into its own internal buffers. This can reduce the limitations on the length of the SG list, but requires more complex logic in the DMA controller to handle it. However this would require the memory holding the SG list to be physically contiguous.

All of these models have the same basic characteristics.  You tell the controller one or more physical address ranges from/to which to transfer data & you tell it to start transferring data.  Some time in the future the transfer finishes and your driver finds out about it somehow.  Hopefully this “somehow” is through an interrupt but it might also involve polling.  The problem with polling is that you are, once again, wasting a very expensive CPU doing something mundane – in this case spinning and waiting on a bit in a register.


Next time i’ll talk some about how you get those physical address ranges in the first place.


-p

Comments (6)

  1. mattd says:

    Peter,

    Great blog so far! I have been a user mode dev for quite a while, any advice or resources you would recommend to get into driver development? Is getting the OSR usb device a good start? Thoughts?

    Thanks.

  2. PeterWieland says:

    Thanks Matt, i appreciate the feedback.

    http://www.microsoft.com/whdc is probably where i’d point people to get started.  I believe they’ve done some work to collect resources that can help get you started.

    I would strongly suggest that you start learning driver develoment by working with the Windows Driver Foundation – either the kernel-mode or user-mode version.  Click the "Windows Driver Foundation" under Driver Stuff in the sidebar to get to the WHDC page for those.

    The OSR USB device is a fun little USB device to start working with.  I’m pretty impressed with what they put together.

    Also, depending on how familiar you are already with Windows, i’d suggest reading Microsoft Windows Internals.  It’s the first book i order for anyone who starts working for me.

    Good Luck.

  3. John S. says:

    Fascinating…..  I’m looking into a problem right now regarding #3 above, where the bus mastering DMA device can spool multiple address/count pairs.  Everything is great, until you get 4-gig of memory installed under Windows 2003 and the PAE kicks in.  Those "map registers" get to be a pretty precious resource at that point.  

    It’s funny you should mention "able to handle 16 fragments in a given transfer" because that’s the number of map registers we get in this configuration.  A tad limiting, considering our DMA will spool 3000+ address/count pairs.  

    Do you have any sources of information for developers using this memory configuration (64-bit addresses) with gather (bunches of map registers in play)?  I’m searching and not finding much.  

    Thanks, john

  4. PeterWieland says:

    John,

    Yeah, the HAL can be pretty stingy with map registers in those cases.  I take it your device only does 32-bit addressing?

    -p

  5. Neeraj Kushwaha says:

    Hi…

    I am implementing one device driver using sglist mechanism. I would like to know how to implement the 4th point u stated in urs article.

    As my sglist is limited to 8 and in best case i can send 8*PAGE_SIZE data=32kb data by pinning down the memory. This is a serious bottle neck ..how can i increase the sglist count so that i can transfer big data size..

    plz suggest

    thanks

    PS. If possible plz send the reply to kushneeraj@yahoo.com

  6. PeterWieland says:

    Neeraj,

    These are all limited by how the device works.  If your device isn’t capable of taking the address of the SG list then you won’t be able to use this technique.  From your question i’m assuming you’re device is more like #3 above.

    is changing your hardware to support #4 an option?

    -p