What’s New for Performance in WPF in .NET 4

Today (4/12/2010) we are excited to make the final .NET Framework 4 RTM build available. It can be downloaded from here and the .NET Framework 4 Client Profile is available from here
You can also check out Soma’s blog officially announcing Visual Studio 2010 which built on top of WPF 4. 
In previous related blogs I discussed the performance improvements we implemented in WPF in .NET 3.5 SP1 and .NET 3.5/3.0 SP1 (see here, and here).
There are many improvements and new features in the WPF 4 release that we are really excited about (read more about those here and here) , but in this post I mainly wanted to focus and provide more details on the specific performance improvements we implemented in WPF 4.

Graphics Improvements

1. New “Cached Composition” API to significantly improve rendering perf of complex visual trees This API provides apps the ability to cache a live UIElement and its sub-tree as a bitmap, and then render the UIElement as quickly as a bitmap (not requiring full rerasterization) when there are no structural changes to the cached subtree.

Transforms, opacities, etc applied above the cached UIElement do not force the cache to be regenerated. The UIElement remains fully interactive while cached, and fully mouse-interactive.

Motivation:
Despite hardware acceleration, WPF’s rendering performance throughput is often limited by all the per-primitive work that must be done when rendering complex scenarios. Without this caching, simply animating an otherwise static element across the screen forces that element to be repeatedly be completely re-tessellated and re-rasterized which can be expensive operations. This often leaves WPF’s rendering pipeline bottlenecked in CPU-bound per-primitive setup cost.

The API enables breaking that bottleneck and allows primitives to be rendered as fast as the video card can draw a quad, moving the bottleneck from CPU primitive setup to GPU fill-rate, which is usually dramatically faster.

In doing this, there can be some loss of visual quality for the sake of performance. Many scenarios would gladly make this tradeoff. Scenarios that could benefit from this API:

  • Scaling and Rotation of live controls, this includes not only images, but potentially complex controls with text.
  • he ability to create a smooth scroll experience that scrolls cached images (tiled or otherwise) of live content that doesn’t have to be re-rendered every frame.
  • Fast scaling and translation of Vector content (e.g. Powerpoint content)

See UIElement.CacheMode and BitmapCacheBrush for more details.
Example:
Setting CacheMode through C#:

 UIElement.CacheMode= new BitmapCache();

Setting CacheMode though XAML:

 <Rectangle CacheMode="BitmapCache" />
<Rectangle>
    <Rectangle.CacheMode>
        <BitmapCache EnableClearType="true" RenderAtScale="4"/>
    </Rectangle.CacheMode>
</Rectangle>

2. New API to allow WPF apps to force SW rendering per process
In NET 3.5 SP1 we added new API to allow developers to force software rendering per application window instead of using the GPU (see my Performance improvements in WPF in .Net 3.5 / 3.0 SP1 blog), in NET 4 you can now do so for the entire process.

As reported (see here), depending on the machine configuration and the application, software-based rendering is sometimes faster than hardware.

This could improve rendering performance for certain scenarios and machines configuration, in most cases Hardware rendering should perform better. Please use carefully and verify with your app and machine configuration.

In certain cases apps may want to use Software rendering for reliability reasons, for example on machines (typically older) that do not have reliable drivers.

This APIs should provide developers a much better alternative than setting the global ‘Disable HW Acceleration’ registry key (see here)

VS 2010 for example is using this feature to force VS 2010 into Software rendering on VMs thus improving reliability. VS 2010 is doing so since some VMs graphic emulation drivers found VS 2010 is doing so since some VMs graphic emulation drivers found not to be reliable. Here is an example for how to use this API:

 public partial class App : Application 
{
    protected override void OnStartup(StartupEventArgs e)
    {
         if (WeThinkWeShouldRenderInSoftware())
            RenderOptions.ProcessRenderMode = RenderMode.SoftwareOnly;
    }
}

Notes:
I) The precedence order for software rendering is:

  1. DisableHWAcceleration reg key
  2. RenderOptions.ProcessRenderMode (per process)
  3. HwndTarget.RenderMode (per-target / window)

II) The app force Software rendering at any time, however there is no way to force Hardware rendering back on once it was set to SoftwareOnly.

3. Added new VisualScrollableAreaClip
API
This allows line-scroll scenarios (e.g: line scroll in editor) to update less area and therefore be significantly more efficient over Remote Desktop (RDP) and Terminal Services scenarios.

The VS 2010 editor which is WOPF-based for example is taking advantage of this api.

(You can read more about Optimizing WPF for Remote Desktop here)

4. The default RenderOptions.BitmapScalingMode

default is now changed Linear instead of Fant. This should provide some perf improvement if you scale images. but will produces lower quality output so be aware of this. If you still want Fant, you can re-enable it.

4. Minor

3D performance improvements. We reduced the amount of DrawPrimitive() calls for large indexed meshes and slightly improved CPU usage for large Model3D counts.

5. The BitmapEffect classes are now no-ops. BitmapEffect used to render in Software and caused perf issues. BitmapEffect are still there so your apps will compile but BitmapEffect will not do anything.

6. Some minor changes to Graphics Rendering Tier

classification Pixel Shader 2.0 is now required for hardware acceleration.

If your card was Tier 1 but did not have PS 2.0 it is now considered Tier 0 causing your app to render in Software.

UI Automation Improvements

1. Significantly improved UI Automation (UIA) performance.
Two major improvements went into this effort:

A) Added UI Automation virtualization support.

This allows WPF apps that target NET4 and have virtualized elements (such as ListView, Tree View, etc ) to benefits from significantly improved performance on tablets and touch-enabled machines as well as none-Tablet machines that have Accessibility clients running (for example: Screen Reader or even have external input device like a pen or Wacom tablet)

To use UIA virtualization WPF 4 is taking advantage the Windows Automation API 3.0 (aka "UI Automation API 3.0" or “UIA 3.0”). UIA 3.0 is included by default on Windows 7 & Windows 2008 Server R2.

The gotcha is that the Windows Automation API 3.0 are not included on other down-level OS's (such as XP/Vista) and must be installed separately in order to get the full Perf benefits.

See more in this blog.

B) Optimized event handling

We addressed UI Automation issues mentioned in this blog

In .NET 3.5 SP1 or earlier, performance problems (such as CPU consumption and generally sluggishness) were especially noticeable when scrolling within an application that contained many visual elements and if UIA client applications were running.  In some cases WPF had to traverse every element in the application tree to check if it need to fire automation event. Depending on machine speed and how many elements are in an application’s visual tree this had significant performance impact.

This was typical on Tablet and other touch-enabled machines because the Accessibility client TabTip.exe (the "Tablet PC Input Panel") is running by default.

It was also possible on non-tablet machines since any machine can run UI Automation client app (for example, UI Spy, Narrator, Magnifier, etc) or had devices connected that also use UI Automation (for example, Wacom touch & pen input type device)

In WPF 4 we fixed these performance issues.

Text Improvements

1. WPF4 now use DirectWrite
for much improved text clarity
This is not really perf per-se but it worth mentioning here.

You can read more in Additional WPF Text Clarity Improvements and Direct2D and DirectWrite posts.

2. Improve text speed
WPF4 English text is somewhat faster (~10%) compare to WPF 3.5 SP1

XAML Improvements

  • WPF designer template parsing is now about 2x faster
  • Performance of loading loose XAML significantly improved.

General Improvements

  • We now use a cached copy of DispatcherSynchronizationContext instead of creating a new one each time. We found this to provide ~15% gain in Editor Scroll scenarios

  • Fixed various memory leaks in WPF 4.

    We plan to update Finding Memory Leaks in WPF-based applications blog soon, so stay tune.

Setup & Client Profile improvements

1. Much improved NET4 Full & Client Profile size and deployment performance.
See graph comparing NETFX sizes:

image

2. NET4 Client Profile is now “first class citizen”
Unlike NET 3.5 SP1 Client Profile, NET4 Client Profile:

  • Supported on all OS that Full is
  • Support for x86 & x64
  • Client Profile is *the* framework that will be available on Windows Update for desktops
  • Supported in all aspect of VS (e.g. targeting, deployment project, etc)
  • Is the default target in almost all VS10 Client Project Templates (Winforms, WPF, VSTO, etc)

Read more about NET4 Client Profile in this blog.