What Is DMA (Part 6) – Scatter Gather DMA the “old” way To be honest, it has been a long, long time since i’ve needed to support slave-mode DMA or packet-based non-scatter-gather DMA. To talk about those i’d probably have to do some (gulp) research. Also I’m not sure how much they apply to modern hardware. It seems pretty cheap these days to buy a DMA controller that can handle scatter-gather for your device. So I’ll start there.
What Is DMA (Part 6) – Scatter Gather DMA the “old” way
To be honest, it has been a long, long time since i’ve needed to support slave-mode DMA or packet-based non-scatter-gather DMA. To talk about those i’d probably have to do some (gulp) research. Also I’m not sure how much they apply to modern hardware. It seems pretty cheap these days to buy a DMA controller that can handle scatter-gather for your device. So I’ll start there.
Way back when, in Windows NT 3.51 and 4.0 (i’ve never had to support anything earlier), there was one set of DDIs to do all your DMA functions. The sequence of operations went something like this:
- Push your I/O through a device queue or the start-IO queue
The old DMA DDIs can only handle processing one request at a time for any given adapter object. The resource used to keep track of the map register allocation is stored in the adapter object and there’s only one of them.
The window for serialization is between your call to AllocateAdapterChannel and your call to FreeMapRegisters. Once you’ve called FreeMapRegisters you can invoke AllocateAdapterChannel again (which you may do indirectly by calling IoStartNextPacket which runs your StartIO routine, which calls AllocateAdapterChannel again).
- Call KeFlushIoBuffers to flush any cached data:
This may not seem necessary on a platform which is cache coherent with respect to DMA, but there are still good reasons to call it. First, how would your driver know that it’s on such a platform (we’ll assume that the system does something special with uncached common-buffer)? Second, pushing data out of the processor caches will help avoid stalls in your DMA operations. Finally – if it’s not really necessary then it’s very likely a no-op, so just call it.
- Call AllocateAdapterChannel to request map registers:
This method can be found on your DMA_ADAPTER object. You give this the number of map registers that you require and an ExecutionRoutine + Context. When the number of map registers you requested are available, the DMA API will call your ExecutionRoutine and provide the specified context. Your execution routine will prepare the buffers and start the DMA transfer. Generally you would pass in your IRP as the context, but it could be any data structure that can tell your execution routine what to do.
How many map registers should you ask for? Remember that each map register allows you to transfer one page of data at a time. You can determine how many pages you need with some annoying math , or you can simply use the ADDRESS_AND_SIZE_TO_SPAN_PAGES macro. This takes an address and a length and determines how many physical pages that buffer spans (a two byte buffer at address 0x8000ffff would span two pages). If you were working with a chained MDL, you would need to call this macro for each MDL in the chain and add the page count together.
You might think you could map less than the full transfer if you’re only transferring say one page at a time. But in asking the DMA folks about this, it becomes very very complicated to keep track of the map registers. So just map the whole thing at once and get it over with. You’ll have a simpler driver and you’ll be happier.
- Your ExecutionRoutine sets up the transfer:
When your ExecutionRoutine is invoked it’s job is to get logical addresses for your buffer and to start the DMA transfer on your device. You might be able to do this all at once, or you might need to “stage” the transfer and do it in page-sized chunks. Either way the steps are more or less the same.
You’ll need to save the map register base in your device extenson, since you’ll need it when the transfer is done. If you are going to transfer in chunks then you will also want to save the list of logical addresses & lengths that you get from mapping the buffer (we’ll call this your scatter gather list), an index or pointer into that list so you know where you left off, and the number of bytes you have left to transfer. For your scatter gather list you’ll need one entry (PHYSICAL_ADDRESS and length) for each map register you were granted.
Next you loop through the buffer, calling MapTransfer to turn each physical fragment into a logical address & length. Since you setup the DMA_ADAPTER with ScatterGather set to TRUE, MapTransfer will return logical address ranges in small chunks. You’ll save these in the scatter gather list.
As you iterate through the buffer you need to track your “CurrentVa”. The CurrentVa is is the offset into the buffer and should be (offset + MmGetVirtualAddress(Mdl)). It’s not a straight offset and this can be a royal pain. In the past i’ve stored an “offset” in my device extension and then i do this math each time to compute the CurrentVa.
You do not need to worry about the map register base – the DMA DDI keeps track of which map registers you’ve used as you map.
Once you’ve mapped enough you can program your device to start the DMA transfer. Use the logical addreses you received to setup the transfer, start the device running, and return from your ExecutionRoutine (i’ll explain the possible return values below). When your device is done you’ll notice somehow (probably an interrupt) and can schedule a DPC to either start the next stage, or to free resources and start the next request.
You do not block in your execution routine. The device should be able to run the DMA transfer on its own and notify your driver when it’s complete.
When your ExecutionRoutine is ready to return you have three options for return status. KeepObject is only for slave-mode DMA where you need to keep the actuall DMA controller allocated for you. For a bus-master you can either return DeallocateObject or DeallocateObjectKeepRegisters. Since you can’t block in your ExecutionRoutine you would only return the first value if you were aborting your DMA transfer and no longer needed the map registers. Otherwise keep them until you’re done with the transfer.
Handle the next stage (optional and may repeat)
If you are staging the transfer then you’ll want to start the next stage when the current one completes. Use the values you saved in the execution routine (scatter gather list, offset into the scatter-gather list and number of bytes left to transfer) to start the next segment.
Eventually you hit the last stage (if you did everything in one shot this is also the first stage 🙂 ), or you decide the transfer has failed. Either way that takes us to the last steps
Undo the DMA mappings
When you’re done with your transfer you need to undo the mappings you created above. To do this you call FlushAdapterBuffers, providing it with the map register base, the MDL, the CurrentVa to start at and the number of bytes transferred.
You should only call FlushAdapterBuffers once at the end of the transfer. If you’ve read the list of operations above, you might realize that you could call MapTransfer before each stage rather than doing it all at once. That’s true, but you should still flush once at the end. Otherwise the DMA DDI can get confused.
Free the map registers
Now that you’re completely done, you call FreeMapRegisters to release the map registers that were allocated. Note that you only need to do this if your ExecutionRoutine returned DeallocateObjectKeepMapRegisters.
Start the next request
And now you’re free to start the next operation. This might be IoStartNextPacket, or it may be more complicated
Simple, huh? Okay – it’s a big pain in the butt. I hate the old DMA DDIs and i try never to use them.
GetScatterGatherList and PutScatterGatherList do most of this work for you, and allow you to run more than one command at a time, so it’s a much better starting point. I’ll probably talk about them next time.
Let me say that again – DON’T USE THESE DDIS UNLESS YOU CAN’T AVOID IT. They’re much too complicated for general use, and the last thing you want when you’re writing a device driver is more complexity.