Redirecting GDI, DirectX, and WPF applications


As mentioned in earlier posts, by far the most important aspect of the DWM is the fact that application windows are redirected to render offscreen, and then the DWM is responsible for compositing those windows to the screen.  So, how exactly does that happen?  That’s what this post is all about.  Redirection is a fairly complex topic, but is completely central to the composited desktop.  Thanks to Jevan Saks and Greg Swedberg for reviewing this post and answering some of my own questions here as well.


Before diving into this, I should clarify something that hasn’t been brought up in earlier posts: the DWM only redirects top-level HWNDs.  Thus, a Multiple Document Interface (MDI) application (Microsoft Management Console, mmc.exe, is a good example of this) will have its overall top-level HWND, with it’s internal child HWNDs, composited as a single entity.  The application process draws the child HWNDs, and their non-client areas, as it always has. 


For the purposes of a discussion on redirection, there are really three types of windows that are of interest: GDI-rendered windows, DirectX-rendered windows, and windows rendered by a mix of DirectX and GDI.  Let’s discuss these in turn.


GDI-rendered windows


Today and for the near future, most applications use and will continue to use GDI to render their content.  Traditionally, GDI applications were notified when a part of their window became unoccluded, and were asked to repaint that portion of the window.  Under the DWM, that window is redirected, and the following happens:



  • A system memory surface the size of the window is allocated and associated with that window.

  • A video memory surface, in the target DirectX pixel format, is allocated, also the size of the window.

  • When an application retrieves the GDI DC of an HWND, it no longer is the DC of the primary video buffer, as it is in the non-composited, pre-DWM desktop.  Instead, the DC is a DC onto the allocated system memory surface.

  • GDI operations on that DC then populate the system memory surface.

  • The system, based on a number of variables, decides to update the video memory surface from the system memory surface at the “right times”.

  • The video memory surface is now up-to-date with the application, and the compositor comes around and uses the video memory surface to composite the desktop from.

There are a few implications of the above that are worth calling out.



  • Dual buffers per window – yes, it’s true that GDI windows have both a system memory and a video memory representation.  There is without doubt a memory cost to doing this.  One obvious alternative is to simply have a video memory representation and have the GDI redirection mechanism render to that format.  There are two primary problems with this.  The first is that the formats are not the same, and GDI doesn’t support rendering into the DirectX format.  Even if that were resolved, the more fundamental issue remains.  Many GDI operations (XORs, alpha blending, and text are examples) are read-modify-write operations.  To do that to a native video memory surface would involve reading back from video memory into the CPU (and thus into system memory), performing the operation, and then writing back.  This is typically a horribly slow and pipeline-stalling operation.
     

  • Minimized windows present a special issue.  Typically when an application receives a minimization, the surface that it’s asked to paint is a nominal size, like 130×30, just enough for shades of the non-client area.  If the application updates the system memory surface at this point, and we continue our copying to the video memory surface, then any surface we may have had available to us for Flip3D or for thumbnail rendering is suddenly gone.  Instead of doing this, we maintain the video memory surface in its last known state, and thus those “secondary window representations” are far more useful when windows are minimized. 

DirectX-rendered windows


Unlike GDI applications, DirectX applications of course can natively render into the DirectX pixel format that the DWM expects.  They also have a very clear indication of when they’re done rendering due to the requirements that they call Present().  As such, DirectX applications only need a single window buffer to manage their redirection.  DirectX window redirection is handled by having the DirectX system, when it’s determining what surface to provide the app with to render to, make calls to the DWM in order to share a surface between the DirectX client application process, and the DWM process.  This “shared surface” support is unique to DirectX atop the WDDM, and is another key reason why WDDM is an absolute requirement for running the DWM.


When a Present() happens to such a surface, the DWM is notified that there are dirtied surfaces that need to be composited to form the desktop, and that serves as an indication to perform a composition.  (It’s actually a fair bit more complicated than that, but this description certainly provides the gist of it.)


Certain DirectX-based applications have much more stringent scheduling requirements (for instance, video applications), and there are public APIs provided that allow the application to get a lot more information, and more control, over when they should render based upon the rendering schedule of the desktop compositor.  That will be covered more in a future topic.


Finally, WPF (Avalon) applications are DirectX applications, so they render just as the DX applications described above render.


Mixed DirectX and GDI Windows


The other reasonably common rendering to a top level window involves mixing DirectX and GDI.  There are two forms of “mixing” here, one is perfectly fine, and the other is problematic. 


The form of mixing that is fine is when there is a window tree of the top level HWND and child HWNDs (and further children, etc), where each individual HWND is either rendered by DirectX or by GDI.  In this situation, the redirection component of the DWM forms its own “composition tree” where each node in the tree represents a node or a set of “homogenously rendered nodes” in the “window tree” rooted at the top level HWND.  Rendering occurs by having each of these render to their own surface, and then compositing this tree of surfaces to the desktop.  Thus, mixed DirectX and GDI rendering works well, so long as the boundary between them is at least at the child HWND level.


The form of mixing that doesn’t work well is when an application uses DirectX and GDI to target the same HWND.  This has never been a supported scenario with DirectX, but there have been scenarios where it has happened to work.  Under the DWM, this is much more problematic, because there can be no guarantee of ordering between the DirectX and the GDI rendering.  This is most troublesome when GDI and DirectX are not only rendering to the same HWND, but to overlapping areas of the same HWND.  As such, this usage pattern is not supported.  Note that there is an alternative that can often work for an application — DirectX is capable of handing back a DC to a DirectX surface, and applications can perform GDI rendering to that DC.  From the DWM’s perspective, that DirectX surface remains purely rendered by DirectX, and all is well.


Drawing To and Reading From the Screen — Baaaad!


Lastly, since we’re on the redirection topic, one particularly dangerous practice is writing to the screen, either through the use of GetDC(NULL) and writing to that, or attempting to do XOR rubber-band lines, etc.  There are two big reasons that writing to the screen is bad:



  1. It’s expensive… writing to the screen itself isn’t expensive, but it is almost always accompanied by reading from the screen because one typically does read-modify-write operations like XOR when writing to the screen.  Reading from the video memory surface is very expensive, requires synchronization with the DWM, and stalls the entire GPU pipe, as well as the DWM application pipe.
     

  2. It’s unpredictable… if you somehow manage to get to the actual primary and write to it, there can be no predictability as to how long what you wrote to the primary will remain on screen.  Since the UCE doesn’t know about it, it may get cleared in the next frame refresh, or it may persist for a very long time, depending on what else needs to be updated on the screen.  (We really don’t allow direct writing to the primary anyhow, for that very reason… if you try to access the DirectDraw primary, for instance, the DWM will turn off until the accessing application exits)

Comments (32)

  1. Princess says:

    "DirectX applications only need a single window buffer"

    So what happens if the application is halfway through rendering its window and simultaneously a translucent window above it notifies the DWM to begin composition?  The contents of the underlying window is required but is not ready.

  2. Adrian says:

    How would someone write a screen-shot grabber in the new model?  With traditional GDI, you’d just grab the handle to the screen DC (GetDC(NULL)) and blit from it.

  3. Dflare says:

    How would OpenGL ( or others GAPIS ) work in the new model?? It seems that they try to get exclusive access to the video card??, Would it work on cooperative mode ??

  4. Kor Nielsen says:

    If I understand your post correctly, this will result in significantly increased CPU usage for GDI applications, especially those performing operations with device-dependent bitmaps.

    According to my (limited) understanding of GDI as used in today’s operating systems, many of the common GDI methods are hardware accelerated. For example, I can create a device-dependent bitmap which is stored in video memory, I can then blit that bitmap to my window’s GC, and the entire operation will be performed by the video card.

    Suppose I wrote a GDI application which used the subset of GDI commands which can be easily hardware accellerated (mainly DDB blits, solid rectangles, and text output). Why should this application be crippled on Vista? Why can’t Microsoft implement GDI on top of DirectX, perhaps failing back to the setup described in your article for apps which use the nasty operations (raster ops and vectors)?

    I love how fast GDI is today in comparison to Quartz and X11. Today, I can write a realtime audio processing application, with constantly updating GUI controls, and be confident that the graphics card will take care of almost all the drawing, leaving the CPU available for intensive realtime DSP.

    With Vista, it appears that this will all change. All the drawing will be done by the CPU into a system-memory bitmap, similar to Quartz 2D(ugh).

    It sounds as though I will have to write two version of my application for maximum speed: one using DirectX for Vista, and another using GDI for older versions of windows.

  5. Anonymous says:

    Kor Nielsen, nah, you don’t need two version. DirectX, if used correctly, is pretty much always WAY WAY WAY faster than GDI even on older Windows versions, so switch to that and your app will run smooooth on any computer with a decent graphics card.

  6. GRiNSER says:

    Kor Nielsen, why don’t you use WPF? 😛

  7. Kor Nielsen says:

    Anonymous/GRiNSER:

    I agree that on a modern PC with a $100 graphics card, DirectX (or OpenGL for that matter) will be significantly faster than GDI, with more eye-candy to boot.

    Unfortunately, not everyone has a decent graphics card. Many of the machines being sold today have terribly sub-par graphics, and there are plenty of 4-year-old machines with no 3D accelleration that are still in regular use. However, almost all of the computers in use today are capable of rendering fast graphics through GDI.

    WPF might be a solution when it is released, but so far I have not been impressed with its speed or stability. Plus, the above-average memory use of the CLR could be unacceptable by potential customers using the 4-year-old machines mentioned above.

    The fact that Microsoft themselves are not using WPF for the Windows Shell makes me suspect that GDI will be the primary graphics API on windows for some time to come. Why shouldn’t it run fast?

  8. Phaeron says:

    Simply moving everything to DirectX is not practical for many apps. DirectX has few to no fallbacks. You must check caps bits for exactly the features you use and write your own fallbacks for everything that might be absent. And using Direct3D without a GDI or DirectDraw fallback is a non-starter, because if your application is running under Remote Desktop you have NO Direct3D support guaranteed whatsoever.

    What I’m wondering is how the DWM handles DirectDraw. Yes, Direct3D has windowed-mode Present(), but DirectDraw doesn’t. I don’t have to use a clipper, nor do I have to pass the window to SetCooperativeLevel()… and in fact I *can’t*, because SCL() requires a top-level window. So how does the DWM know which child windows are GDI-based and which ones are DirectDraw-based? Does it check for a clipper? Or does any use of DirectDraw simply disable the DWM?

  9. Eli says:

    Phaeron,

    It’s probably when you flip surfaces. You can’t lock the primary surface in DirectDraw and co-exist with the DWM. It will disable Glass for your application.

  10. Phaeron says:

    You can’t do an emulated flip in windowed mode in DirectDraw like you can in Direct3D. You can only Flip() the primary surface or an overlay, neither of which I would expect to be supported by the DWM. What you can do, however, is a Blt() from an offscreen surface to the primary surface through a clipper attached to a window. If this isn’t DWM compatible, then the only alternative for fast 2D graphics with Aero Glass — or even just *accelerated* 2D graphics — is to go 3D. Very few 2D graphics apps have a 3D rendering path, and at this point it’s still more work and more compatibility hassle than either GDI or DirectDraw.

    Regardless, all of the notes on this page about what does and doesn’t break the DWM is valuable information that should go directly into the Platform SDK for Vista.

  11. David Hopwood says:

    Being able to write directly to the screen via GetDC(NULL) defeats the point of the UIPI (User Interface Privilege Isolation) stuff, if I’m not mistaken (imagine a virus writing over the "OK" and "Cancel" buttons of a security dialog so the user will click OK when they meant Cancel). So why not disable it?

    Yes, I know this would break some applications. However, as a temporary workaround for apps that write to GetDC(NULL), what you could do is to have it return a DC for a dummy surface, that is composited underneath the layer containing UIPI dialogs. This will also work for apps that only rely on reading stuff from GetDC(NULL) that they have written.

    As for screen grabbers, that’s a clear security bug. As long as ‘Shift-PrtSc’ and supported screen readers still work, let any other screen grabbers break.

  12. Adrian says:

    David Hopwood wrote:  "As for screen grabbers, that’s a clear security bug."

    That’s not so clear to me.  Can you elaborate?

    The GDI-based systems, built-in print screen has limitations that can be solved by writing a better one.  Why should these custom tools be forced to stop working?

  13. David Hopwood says:

    It’s a security bug that unprivileged applications can read the contents of other applications’ windows, as they currently can.

  14. Adrian says:

    "It’s a security bug that unprivileged applications can read the contents of other applications’ windows, as they currently can."

    But the solution to that is not to prevent all applications from accessing the screen, just the lower privilege ones.  There are lots of valid reasons for screen captures (documenting, automated UI testing, etc.), and it seems this approach is going to break existing tools for these applications.

    Will the CAPTUREBLT flag on BitBlt ROP codes be obsolete on Vista?  Will it let me capture windows at or below my privilege level?

  15. igor1960 says:

    David Hopwood wrtote:

    >>"It’s a security bug that unprivileged applications can read the contents of other applications’ windows, as they currently can."

    <<

    While it maybe viewed as a security breach, I don’t see how DWM prevents from such breach. It may complicate it,but in fact it may even simplify. I would assume, Microsoft will provide an API to enumerate through DWM window buffers: in fact for sure such API already exists as Windows Task Manager, Application Switcher and etc. alreday utilizes such API (maybe not public for now), by providing images of current Task Windows…

    =========================

    Anonymous wrote:

    << DirectX, if used correctly, is pretty much always WAY WAY WAY faster than GDI >>

    GRiNSER:

    << why don’t you use WPF? >>

    This is not true that "DirectX is faster then GDI". It all depends on what operation are involved. If we are talking about just bitmap blitting, with no scaling and/or N (x/y, where N is integer)stretching then GDI is usually much faster (especially on current PCIenhanced graphics hardware with 1gbs and higher).

    Almost on all modern computer systems, even with the best graphics cards (like Nvidia SLI)

    DrawDibProfileDisplay that is based on DirectX, returns in favor of StretchDIBits…

    Check this:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/multimed/htm/_win32_drawdibprofiledisplay.asp

  16. David Hopwood says:

    "The solution to that is not to prevent all applications from accessing the screen, just the lower privilege ones."

    I was concentrating on the design changes that are needed before it’s possible to prevent any apps from reading the screen. Almost all applications should be low-privilege.

    "I don’t see how DWM prevents [such a] breach. … I would assume, Microsoft will provide an API to enumerate through DWM window buffers."

    Probably, but since that would be a *new* (or private) API, it can apply whatever security restrictions are appropriate with no backward compatibility issues. The problem with GetDC(NULL) is that existing unprivileged apps expect to be able to use it.

  17. Gabe says:

    Most apps run with standard privilege. IE would run with lower privilege by default, and services with UI would run with higher privilege.

    Privilege escalation windows run in a different desktop (which appears to be drawn on top of your regular desktop), so it’s impossible for any app on your regular desktop to draw on it with GetDC(NULL).

    A low-rights window like IE would only be able to access other low-rights windows. Otherwise, I don’t see why it would be a problem for any standard privilege window not to be able to read/write any other window. If this were a classified system (in the military sense), you could not have a Secret window reading a Top Secret window, but this isn’t a problem for regular apps.

  18. Mike says:

    Just to say it explicitely, reading from GetDC(NULL) is still a supported way to do screenshots.  That’s basically what happens when you hit the printscrn key.  Yes it’s a lot slower than it is without DWM but screenshots aren’t typically a critical path perf scenario.

    Another thing that should be said explicitely is that although everybody calls it some varient of "the GetDC(NULL) issue" it is not the call to GetDC(NULL) itself that is expensive.  If all you want to do is get some device caps from the DC then go for it.  It’s only actual reading and writing that’s slow.

  19. When talking about WPF during the Windows Vista ISV Touchdown training a lot of people were interested…

  20. In the earlier posts I’ve done on the DWM, there’s been a hint of the relationship between it and the…

  21. A good amount of ink has been spilled on this blog talking about all the

    cost, nuance, impact, and…

  22. Let’s dispel some odd rumors of late: Java works great on Vista. But let’s not stop there; let’s also dive into some of the challenges that Vista presented for us to make Java work well.