WPF Graphics Performance Q & A - Some explanations about WPF graphics architecture & overhead

Article
06/16/2006

I’ve been getting a lot of the same performance questions over the last few months regarding the WPF graphics model, so I thought I’d post some responses for everyone to see. This should shed some light on what WPF does and doesn’t offer in terms of fast vector graphics.

It's likely that I'm going to be porting this to an MSDN article soon, but wanted to let you guys see it first, and get your feedback before it goes live. Have any more questions you'd like to see answered about the graphics model? Let me know, and they may make it into an MSDN whitepaper!

WPF is a retained scene, with a separate rendering thread, which has its own set of data. My scenario can’t afford all of this overhead. Why not expose an immediate-mode API?

That’s a great point. The way I like to think about this is to consider what existed before WPF. There was GDI & GDI+, which are software-based immediate-mode APIs, and Direct3D, the immediate-mode API for hardware rendering. Today, nothing’s changed there. All of these APIs still exist, and are still tools in your development toolbox. For example, if you’re going to write the newest 3D first-person fast-action shooter, you’re going to want to pull out that Direct3D sledge hammer. Using GDI+ or WPF to write a 3D game might start to feel like breaking concrete with a screwdriver.

What we’ve done with WPF is provide an entirely new set of possibilities around retained-mode graphics programming, which simply haven’t existed before. This is a brand new class of tool, one that frees developers from thinking about pixels, painting, and vector processing. You describe your scene using high-level constructs (or heck, even a designer tool like Microsoft Expression Interactive Designer), and we’ll worry about the rest. On top of that, just having a retained scene allows us to expose cutting-edge features that persist beyond an immediate-mode call, like animation & DVD-like ‘just-hit-play’ video controls. Do I hear any fans of MediaElement out there?

Finally, one thing our team has learned about scene processing is whether or not it’s ‘affordable’ tends to be scenario dependent. For scenarios which aren’t affordable, my team’s entire job is to continually drive those costs down. Understanding how WPF performs in your scenario is key. I heartily recommend prototyping the operations you’re concerned about using WPF, GDI+, and/or Direct3D, and then profiling, to see where the cards lie. Rico Mariani, the CLR performance architect, calls this process budgeting and it’s a great practice to get into.

So why are we shipping a retained-mode API in V1 instead of an immediate-mode API?

WPF is a platform for developing Windows applications, and if you look at what 99% of Windows apps need, it’s a retained scene. In fact, all applications which use immediate-mode APIs (even games) still have some sort of retained scene for generating those immediate-mode calls. It’s just more specific to their application. For most folks, you don’t want to spend your time worrying about scene processing, rendering loops, and bilinear interpolation. There are higher-level fish to fry.

What do the remaining 1% of apps look like, who needs an immediate-mode API anyway, and how do I know if I’m in that bucket?

The short answer is, applications that spend nearly all of their time rendering, such that it makes sense for them to invest in a super-performant graphics pipeline. Graphics-intensive apps like games, CAD, or complex game-like 3D visualizations come to mind. For these folks, Direct3D is the way to go.

But I’m a GDI/GDI+/WPF programmer working on SuperAmazingCAD 4000, and Direct3D doesn’t provide me with the 2D painting model I’m used to.

For this class of folks, software-based GDI+ or hardware-based Direct3D is all you use to be able to choose from. If GDI+ software rendering wasn’t fast enough, then your only other choice was Direct3D. In reality, to be successful, you’re probably going to have to break out a couple geometry processing books and use Direct3D. But that’s the way it’s always been.

With that said, WPF is a brand new option, and with its introduction, there are a whole new set of scenarios that WPF supports without going directly to Direct3D. For the remaining scenarios that can’t afford GDI+ or WPF, what you really need is something Microsoft has never done before – a hardware accelerated 2D immediate-mode API . This is a great feature request, and if this is your scenario, please give me the details! As usual, we can’t make any promises, so don’t bet the bank on it happening soon. But when it comes to new features, there are few things more motivating than scenarios you provide us with, especially if there’s performance data clearly backing them up.

WPF allows objects to be used multiple times, creating graphs which are notoriously problematic when it comes to performance (e.g., many other graphics platforms disallow them). What’s the deal?

Simply put, we found that people like to re-use objects. This is why the Freezable pattern exists. We tried other patterns that copied objects more prevalently instead of reusing them, but quickly realized that they are extremely difficult to use (anyone remember the Changeable pattern from early builds?) .

One simple & prevalent example of the need for object re-use is SolidColorBrush. If a scene is using 200 black brushes, it just doesn’t make sense for 200 instances of the brush to exist, especially since reusing a SolidColorBrush has no performance impact other than using less memory.

To give you more context as to why re-use can be expensive, consider that all rendering WPF does eventually goes through Direct3D. If you look at the Direct3D API, you’ll quickly realize that text, paths, rectangles, and gradients don’t exist. All you really have are points (x,y,z) specified in vertex buffers, and either solid colors or bitmaps (aka textures) that are ‘stretched’ across those points. Higher-level primitives such as geometry and brushes are a huge value-add that GDI+ and WPF bring to the table.

To get from the rich API WPF exposes to the lingua franca of Direct3D, intermediate representations are often created. For example, it’s no mystery that WPF renders text into bitmaps, which are then handed off to Direct3D. If these representations are specific to a certain scale factor, or pixel position on the screen, then re-using an object in different places means different intermediate representations are created. Very often only a single representation is cached (for memory & scalability purposes), so re-using can mean these intermediate representations are re-created more often than they would be in single-use scenarios.

Summarized, the primary cost involved with multi-use is the possibility of invalidating cached intermediate representations, if the particular object happens to use them (logically, APIs which look more like Direct3D APIs will use intermediate representations less heavily). I used text as an example because this is a multi-use performance issue we recently found in a profile. The good news is we’ve been working on text improvements since the last CTP, so look for better performance there in future releases. Optimizations for multi-use scenarios is a pattern that I see continuing over time.

With the Freezable pattern, we’re given you the option of whether or not to re-use an object. You can even opt out of the overhead involved in tracking Changed notifications by freezing it. You control the scene, so, if using an object multiple times becomes a problem (as evidenced by profiles, or multi-use vs. single-use comparisons), create multiple copies.

Adding seemingly simple tweaks (e.g., clipping, bitmap effects) to our scene causes us to fall back to software, and software rending in WPF is slower than GDI+ software rendering.

First, the WPF software rendering code is derived from the GDI+ codebase. There are certain limits to what can be accomplished in hardware, and we have to work around what the hardware vendors give us. As graphics hardware evolves, those limits are likely to become better over time. If at least some portion of your scene is rendered in hardware, the cost of rendering is already going to be faster than it was in GDI+. Finally, we shipped a tool at the PDC called ‘Perforator’ to help identify where software rendering occurs.

Some WPF features, such as gradients, must be rendered in software and then copied into texture memory on the video card. With many Windows applications running, doesn’t this create a new class of bottlenecks around copying textures to the video card?

Ever since our first public preview, fewer and fewer features are being rendered in software. That’s a trend I expect to continue. For example, only a very small portion a linear gradient is rendered by the CPU. Most of the pixels contained within a linear gradient are calculated by the GPU (for tier-2 cards, radial gradients are also done on the GPU). In addition, compact WPF vectors are a new option for content which has been traditionally rendered using bitmaps. The result will be applications which use fewer bitmaps, and thus less video memory, in favor of scalable vector graphics.

Finally, with Vista, the WDDM gives us virtualized video memory. This helps make sure the most important textures are kept in video memory via an efficient memory eviction mechanism -- something we’ve never had before.

For more on when to use Direct3D vs. GDI+ vs. WPF, see Pablo Fernicola’s March 28 blog article.

For more on the WDDM, see Greg Schecter’s April 2 blog aritcle.

WPF Graphics Performance Q & A - Some explanations about WPF graphics architecture & overhead

Additional resources