You must flush GDI operations when switching between direct access and GDI access, and direct access includes other parts of GDI

A customer was running into problems when accessing the pixels of a DIB section. They used the HANDLE parameter to Create­DIB­Section and created two bitmaps from the same underlying memory. Those two bitmaps were then selected into corresponding DCs, and the customer found that changes to the pixels performed by writing via one DC were not visible when read from the other DC.

The customer pointed out this clause in MSDN:

You need to guarantee that the GDI subsystem has completed any drawing to a bitmap created by Create­DIB­Section before you draw to the bitmap yourself. Access to the bitmap must be synchronized. Do this by calling the Gdi­Flush function. This applies to any use of the pointer to the bitmap bit values, including passing the pointer in calls to functions such as Set­DIBits.

The customer said, "The description says that it applies to cases where you modify the bits yourself through the direct memory pointer. But all of our access is performed through HDCs; I would think GDI is smart enough to handle that, but we've found that we still need to call Gdi­Flush to get the two DCs back in sync."

What you ask GDI to do you have done yourself. That's why the documentation say any use of the pointer. Sort of like in law, where in many causes you can be punished for "doing X or causing X to be done." If you induce somebody else to do X, you're in violation as much as if you had done X yourself.

I doubt that every call to GDI ends with a big loop that checks whether the bits it just modified also belong to some other GDI bitmap in the system.

GDIFinishAPI(HDC hdc)
 if (IsDIBSection(GetCurrentObject(hdc, OBJ_BITMAP), &ds)) {
  EnumGdiObjects(FlushIfOverlap, &ds));

FlushIfOverlap(HGDIOBJ h, DIBSECTION *pds)
 if (IsDIBSection(h, &ds) &&
     DIBSectionsReferToSameUnderlyingBits(pds, &ds)) {

That would seriously slow down all DIB section operations to cover a rare scenario that most people don't realize is even possible to create. Not the best engineering tradeoff.

The point of the documentation is is that if you ask GDI to mess with the bits in the bitmap via the HDC, you must call Gdi­Flush before anybody else tries to access those bits, even if that "somebody else" is another part of GDI. The example of Set­DIBits is an attempt to capture the sense of this requirement.

Translating into this specific scenario: You must flush the pending changes whenever you switch between "GDI accesses bits through the DIB section created by this handle" and "the bits are accessed by anybody else." And "anybody else" could be "GDI accesses bits through the DIB section created by a different handle."

Bonus chatter: What's the deal with Gdi­Flush anyway?

As a performance optimization, GDI performs "batching" of operations. When you ask GDI to perform an operation, it doesn't always do it right away. Instead, it may choose to store the action in a buffer, and when the buffer gets full, it "flushes the batch" and sends the commands that it had been saving up into kernel mode for execution. (This idea of buffering up operations and submitting them as a batch is hardly new to GDI. The C stdio library does it, and in networking, a variation of it goes by the name Nagle's Algorithm.)

GDI also flushes the batch when necessary in order to preserve semantics; for example, if you call Gradient­Fill and follow it with a call to Get­Pixel, GDI needs to flush out the Gradient­Fill before issuing the Get­Pixel so that the pixels that get read match the pixels that were written. (A much more common case of just-in-time flushing is where you Bit­Blt the results out of the bitmap into another device context.)

This behind-the-scenes optimization works great with one exception: DIB sections. Since the memory for DIB sections is directly visible, GDI doesn't get a chance to sneak a call to Gdi­Flush before you issue your "mov eax, [esi]" instruction. Hence the clause in MSDN explaining that when you switch between GDI access and direct access, you need to call Gdi­Flush to get all pending operations out of the buffer so that the bits in memory match the operations you performed.

Many years ago, we saw another case where we had to compensate for GDI batching.

Comments (8)
  1. Christopher Walken says:

    Excellent post Raymond. We are always grateful when you explain in detail the mysterious shenanigans that go on in the background of GDI. Thanks!

  2. steven says:

    Very insightful post, as usual. I think this explains why I had trouble with bit of a GDI-drawn logo being missing when saving the DIB section to a custom file format. I wish I'd found the GdiFlush function back then rather than the embarrassing workaround I ended up using.

  3. Ivo says:

    The docs for GdiFlush say that "Calling any GDI function that does not return a Boolean value" will flush GDI. Does "GDI function" mean "a function from Gdi32.lib"? Will SelectObject or GetCurrentObject cause a flush, and if so – why?

  4. Bob says:

    You also have to call GdiFlush if you pass GDI objects between threads, even if you don't access the bitmap directly (the batching is per thread, so if thread A uses a GDI object and then passes it to thread B w/o calling GdiFlush, thread B doesn't know that there are still pending operations on it).

  5. 640k says:

    Then when will the graphic subsystem in windows be thread safe?

  6. Simon Buchan says:

    @640k: A bit after that ever becomes a good idea for immediate mode rasterizers.

  7. Neil says:

    The last time I had to use GdiFlush was to port Mozilla 1.7 to run on Windows NT 3.51, whose batching was more aggressive than that in later versions of Windows.

  8. Worf says:

    @Neil – makes sense as graphicx drivers in NT 3.5 series were user mode, so GDI operations had to undergo kernel task switches. Doing this for every GDI call would just bog things to the point of unusability with all the IPC going on.

    NT 4 onwards moved the display driver into the kernel, so things were much faster and thus, less batching is needed as each call is faster.

Comments are closed.

Skip to main content