Normally, the CPU and GPU run in parallel. Framerate = max(CPU time, GPU time).
If your code causes a pipeline stall, however, the processors must take turns to run while the other one sits idle. Yikes! Now framerate = CPU time + GPU time. In other words, programs that stall can be both CPU and GPU bound at the same time.
The easiest way to cause a stall is to draw some graphics into a rendertarget, then GetData on the rendertarget texture to read the results back to the CPU. Think about what happens if you do this:
- Charles (the CPU) is processing your drawing calls.
- He has filled a piece of paper with instructions for his brother George (the GPU).
- Charles reaches an instruction that says “copy data from George back into this array”.
- But the drawing instructions haven’t actually been processed by George yet!
- Charles cannot just note down the GetData call on his piece of paper. The next instruction might use values from the array, so he needs that data right away.
- Charles has no option but to immediately hand the incomplete list of drawing instructions over to George, then wait around twiddling his thumbs in boredom until George has finished drawing everything, at which point Charles can resume processing the GetData instruction while George becomes idle.
One of the great successes of the Direct3D API is how it hides the asynchronous nature of GPU hardware. Many graphics programmers are writing parallel code without even realizing it! But as soon as you try to read data back from GPU to CPU, all this parallelism is lost (one reason it is hard to accelerate things like physics or AI on the GPU).
A similar problem occurs with occlusion queries. To avoid a stall, the query returns immediately, but with the IsComplete property set to false. The query completes at whatever later time George gets around to processing the relevant drawing instructions. Games must deal with this data not being available straight away. For instance our Lens Flare sample falls back on occlusion data from the previous frame if the latest information is not yet available.
There is one situation where you can cause pipeline stalls purely by writing data to the GPU, rather than reading back from it. Can anyone figure out what that is?