.Net Compact Framework Advanced Memory Management

This is a work in progress on a paper that I am writing to help folks understand the internals of Windows CE and the .Net Compact Framework Memory Management. The final paper will be published on MSDN.

Introduction

One of the things that developers struggle with when moving from C++ to Visual C# or Visual Basic is developing a usable cost model around speed and size issues. Frankly, C++ memory management is so difficult that you can’t even write correct code without a solid understanding of heap internals. Typesafe and garbage collected languages, by eliminating the possibility of double frees and buffer overruns and automating memory management can substantially improve developer productivity. There are two drawbacks to this from the perspective of writing rich high performance applications that run on resource constrained devices. First, the common language runtime (CLR) itself is silently doing work at runtime on your behalf, and it’s not simple to account for the processor time or memory being used. Second, portions of the Microsoft® .NET Compact Framework class libraries are both general and high level enough that they are not suitable for highly performance intensive applications on devices.

This paper describes the internals of both Windows CE and the .NET Compact Framework memory management systems, with the goal of helping developers understand the performance and size impact of application design decisions. An environment where applications can suddenly terminate due to hard out of memory (OOM) errors has been long forgotten by Windows PC developers, but is an everyday fact of life for device developers. This paper should help answer both “when should I use managed code?” and “how much memory will my application need?”

Native Code Quality

Before we drill into size issues, let’s examine .NET Compact Framework native code quality issues, as they usually can be eliminated as the primary focus of performance issues. Native code quality refers to the density and organization of the machine instructions generated by a complier relative to the corresponding high level language source lines.  

Tests show that the .NET Compact Framework 2.0 generates code that is roughly 50% as good as the full Microsoft® .NET Framework, itself being somewhat worse than the best optimizing C++ compilers. If your application moves so much data that you would expect a well written C++ routine to utilize 50% or more of the available processor time, this application is not a suitable candidate for .NET Compact Framework. An example of such an app would be a video or audio playback.. Fortunately most applications spend their time running routines that exist for the convenience of developer and can usually be optimized considerably while staying within a Visual C# or Visual Basic environment.

There’s a fundamental trade off between the needed time to generate native code at compile or runtime (just-in-time (JIT) compilation) and the density and execution speed of code. Compiling and optimizing C++ files is in many ways optimal from a performance perspective, since complex algorithms can be performed over a large code scope. The full .NET Framework has the ability to generate the native code at the time the application is installed using the Native Image Generator tool (nGen). We can’t do this on devices, since we can’t afford the extra space that nGen requires to store the generated code; native code is approximately 3 times the size of Microsoft Intermediate language (MSIL) code.

On devices, there’s little free processor and battery to burn at JIT-time generating highly optimized code, and you often have to re-JIT code in low memory circumstances, so we have opted for a relatively simple (and fast) JIT-compile and placed the bulk of our performance optimization on the memory management routines. Both the JIT-compiler and memory management will continue to get better in subsequent releases of the .NET Compact Framework.

One area where the code quality of the .NET Framework may cause performance issues is the method call path. Method and property calls in Visual C# and Visual Basic are 2-5 times slower than an optimized C++ call. If you have an application that makes heavy use of recursion or is architected to have frequent and deep calls across small classes may run more slowly than you want. Such an application can be difficult to optimize with a profiler as the total app time is spread across a large code base with few hot spots.

Handling Large Amounts of Data with SQL-Mobile

The other area where code quality can impact perceived performance is in code paths that are responsible for user interaction, and where large numbers of relatively small messages are passed around (such as in a WinProc).

The .NET Compact Framework implementation of Windows Forms (and the upcoming DirectX support in version 2.0) is highly optimized to keep code paths short, in part by implementing some of the class libraries in native C++. In practice, you can achieve nearly the same Windows Forms performance as you would expect from C++ calls to User/GDI, and with little additional pressure on the garbage collector. If you write a fully managed control, it is worth the effort to keep code paths and call stacks short to avoid performance problems. Version 2 of .NET Compact Framework will enable the creation of hybrid controls, where the heavy lifting is done entirely in C++ but the control can be exposed to Visual C# and VB through Windows Forms. This is suitable for highly processor intensive controls like media playback.

It is possible to write large high performance applications by effectively deferring the typical performance critical paths to SQL-Mobile and Windows Forms and still maintain the productivity benefits of Visual C# and Visual Basic. It is possible to use this same technique in your application, deferring to native code when necessary, but this decision must be made early in the development cycle and the speed benefits must make up for the extra time is takes to move data across the P/Invoke boundary. This is an advanced programming technique, and if misused, can lead to needless development complexity.

Managing Memory

A common question we get asked is “what will the working set of my application be?” Working set isn’t a particularly helpful concept in this environment for several reasons:

· It really doesn’t apply to Windows CE with no demand paging of read/write (R/W) pages.

· It doesn’t make sense in a garbage collection (GC) based environment like the .NET Compact Framework. GC’s seek to maximize the RAM they use for collection efficiency and are capable of even better compaction when necessary than a typical malloc()/free() system.

It’s also not that helpful to measure the amount of operating system memory that your process appears to be using since this number will change radically during the lifetime of your program.

It is more helpful to try and answer the following questions:

·        How much free system RAM is needed to prevent my application from exiting with a hard OOM error? Will I run out of processes or virtual address space and exit with a hard OOM error?

·        How much free RAM is needed to make my application run with suitable speed?

· What is the formula for my application between RAM usage and application data? How does this scale as the size of database changes? What are the upper limits on the amount of data that I can effectively sort, search and display?

Windows CE Memory Management "Basics"

Windows CE can create up to 32 application virtual address spaces. The operating system loader creates an association between executable code in a native DLL or EXE and the process’s virtual space, and then uses demand paging system to back the virtual address space with physical memory as needed to actually execute the instructions.

Windows CE, by default, stores files in a compressed format, and expands them as they are faulted in. Physical memory needed to execute code can be released by Windows CE when the system is low on memory. If the same DLL or EXE is used by multiple applications, the physical memory is shared across all application processes. A range of virtual space is allocated for each DLL or EXE that is reserved across all processes as long as any one process has a mapping to the DLL or EXE.

The 32MB application virtual address space can come under a lot of pressure by a large application, and if it becomes sufficiently fragmented, can lead to OOM errors even when there is free physical memory. The MSDN article, Windows CE .NET Advanced Memory Management  , by Douglas Boling provides more information about avoiding virtual address space fragmentation.

In order to reduce pressure on the virtual address space of each application processes, Windows CE creates a single 32MB system code virtual address space for native DLLs and EXEs that are considered to be part of the operating system itself. This address space is like an application address space but supports only executable system code. System code can be either execute in place (XIP), or decompressed and run out of physical memory, depending on the hardware and system configuration. In the case of compressed files, the physical memory backing this address space is demand paged and fully shared across all processes.

Windows CE creates one additional virtual address space, 1G in size, for large allocations. Applications can choose allocate blocks of memory out of this space, and the operating system always uses it for memory mapped files. All memory mapped files are backed with demand paged and shared physical memory.

.NET Compact Framework Fixed Costs

When a .NET Compact Framework application starts, the following sequence occurs:

1. If they are not already loaded, the .NET Compact Framework CLR native EXEs and DLLs are mapped into the virtual address space of the 32MB system code address space. For version 2.0 of the .NET Compact Framework, this takes about 650K of virtual space, and in the worst case, 650K of physical space. The Windows CE operating system can demand page the physical memory in low memory situations.

Subsequent .Net applications do not require any additional virtual or physical address space to run the CLR itself.

In the event that the .NET Compact Framework has been installed into “user store” and is not considered to be part of the system, the CLR native EXEs and DLLs are loaded into the application address space, but the physical pages are still demand paged and shared across application processes. This scenario does place additional pressure on the application virtual address space. Upcoming releases of Windows CE will support an “in place” install and upgrade of the .NET Compact Framework such that it will always be considered part of the system.

2. The .NET Compact Framework class library assembles (“managed DLLs”) are memory mapped into the 1GB large allocation address space. They are backed with demand paged and shared physical memory as needed. The physical memory backing these files is always demand paged and shared across processes.

At this point, the worst case fixed cost for using the .NET Compact Framework is 3.8MB of virtual space out of the 1GB space, and 650KB of virtual space out of a 32MB space, and whatever physical working set is needed to run without thrashing. We estimate this to be in the range of 1MB – 2MB for a typical application. Subsequent .NET Compact Framework-based applications do not incur any additional fixed cost.

3. The managed application assemblies (the main EXE and class library DLLs) are mapped into the 1GB address space as memory mapped files. The physical memory backing these files is demand paged.

To estimate the memory pressure that your application will place on the system, assume that your files grow by 50% when they are decompressed from the Windows CE file system (or use the file size reported by an uncompressed Windows PC volume) as the virtual size. This is allocated from the 1G space, so effectively places no pressure on the system’s virtual limits. Assume that the physical set required to back this without thrashing is 50% of the virtual size.

.NET Compact Framework Startup Memory Consumption (First Application Only)

 

 

Foreground App – Virtual

 

Foreground App - Physical

Background App - Virtual

Background App - Physical

How to Measure

.NET Compact Framework native code

650KB out of system 32MB space

 

475KB (assume 75% hot)

650KB out of system 32MB space

475KB (assume 75% hot)

file sizes

.NET Compact Framework Memory Mapped Files

3.8MB out of system 1GB space (worst case)

1MB (assume 25% hot)

3.8MB out of system 1GB space (worst case)

1MB (assume 25% hot)

file sizes

.NET Compact Framework Dynamic Memory Costs

At this point, the .NET Compact Framework uses virtual and physical memory in your application address space, which is not paged or shared, for three things:

· The CLR data structures that are created “just-in-time”, as objects are instantiated from classes. Each data structure is created only once per instantiated class (per application). The size of these data structures varies based on the size of the class, but a large application will generally not use more than 250K for this purpose. These data structures exist for the life of the application process.

· Naïve machine code that is generated by the CLR JIT-complier. The size of the JIT’d code depends entirely on the number and size of the classes that are called, but is typically in the 250KB-500KB range. Applications that generate more than 2MB of JIT’d code will run, but may suffer from performance issues caused by long code paths. This occurs typically when an application that has been written for the Windows PC environment is ported to a device. Large applications generally require device specific architectures to run well. Using Visual C# or Visual Basic still provide substantial developer productivity gains even when two code bases are required.

In response to low memory conditions, .NET Compact Framework will release all of the JIT’d code and return this memory to the operating system. For planning purposes, assume that the entire JIT’d code space is backed by physical memory while the application is the foreground. When the program moved into the background, .NET Compact Framework released all the JIT’d code and returns the physical memory back to the operating system.

· Storage for the “instance variables” of objects that are managed by the garbage collector. Instance variables are non static variables that are declared (in Visual C#) inside a class, but not inside a method. The sum of the storage needed for all instance variables is roughly the amount of space that each instantiated object will require in the GC heap. Variables declared inside a method are allocated on the thread stack, exactly as they are in C++.

If you take a snapshot of the GG heap at any point in time, you will find two sets of objects, “reachable” and “unreachable”. As the system runs, the GC periodically locates and releases the storage for all unreachable objects. Since the task of finding the unreachable objects itself takes processor time, and freezes all application threads during the search, it is in the best interest to delay this action as long as possible. If there were only one process and GC heap inside the entire system, it would make sense to allow the heap to grow until the system was out of memory, and then run collection. In system with multiple processes running, allowing any one of them to do this would starve the others, so a balance must be found between deferring GC as long as possible, and keeping the load on the system memory in control.

.NET Compact Framework uses several techniques to accomplish this. When an application process GC heap grows to 1MB, the GC runs a “simple collection”, which releases all unreachable object storage back to the GC heap, so that new allocations in the same application can reuse the space. A simple collection is relatively fast to run, but does not return virtual or physical back to the operating system itself.

There are a set of triggers that will cause the .NET Compact Framework GC to execute a “full GC”, which first runs a simple GC, and then returns the reclaimed virtual and physical memory back to the operating system. For version 2, the triggers are:

o WM_HIBERNATE is received from the OS (the system is low on memory)

o The app is moved into the background

o PALHeap_Alloc* is OOM. It retries once after performing a global GC

o CreateSolidBrushGC is OOM

o CreatePenGC is OOM

o CreateCompatibleDCGC is OOM

o CreateFontIndirectGC is OOM

o CreateRectRgnGC is OOM

o CreateCompatibleBitmapGC is OOM

o CreateDIBSectionGC is OOM

o ImageList_CreateGC is OOM

o ImageList_AddGC is OOM

o ImageList_AddIconGC is OOM

It’s important to note that calling GC.Collect() is almost never the right thing to do. When an application moves into the background, all JIT’d code and all unreachable objects are released, and the memory is returned to the operating system. Calling GC.Collect() won’t have any effect on your application not running out of memory, since the .NET Compact Framework CLR runs a GC before returning any out of memory exceptions to your app. Since GC.Collect() has to first locate all reachable objects, it is expensive to run even when it can find little or nothing to collect.

It’s also important to note that a C++ application that does not make proper use of the Windows CE VirtualAlloc() and VirtualFree() calls can easily end up with fragmented heaps that do not return memory to the operating system and are more resource hungry than a .Net Compact Framework application is.

From a capacity planning perspective, the important number to understand is the size of reachable object set. This is the size of the memory that cannot be released after a GC and is backed by physical memory.

While the execution time of the garbage collector itself is not usually a performance problem, if you application generates an exceptional amount of short lifetime objects, this may be an issue.

Per Application Memory Consumption

 

 

Foreground App – Virtual

 

Foreground App - Physical

Background App - Virtual

Background App - Physical

How to Measure

Application Memory Mapped Files

Total of app uncompressed file sizes out of system 1GB space

 

50% of total (assume 50% hot)

Total of app uncompressed file sizes out of system 1GB space

50% of total (assume 50% hot)

file sizes

.NET Compact Framework Internal Data Structures

250KB out of application 32MB space (typical)

 

250KB (no R/W paging)

250KB out of application 32MB space (typical)

250KB (no R/W paging)

mscoree.stat

JIT’d Native Code

250-750KB out of application 32MB space (typical)

 

250-750KB (no R/W paging)

250-750KB out of application 32MB space (typical)

 

Zero

mscoree.stat

GC Heap

The larger of 1M or the size of the reachable object set, out of application 32M space

 

The same (no R/W paging)

The reachable object set out of application 32MB space

The reachable object set out of application 32MB space

mscoree.stat and GC.

GetTotalMemory

Thread Stacks

64KB / thread out of the application 32MB space (typical)

The same (no R/W paging)

The same (no R/W paging)

The same (no R/W paging)

count threads

Looking Forward

The future areas of focus for the .NET Compact Framework team will be around improving the quality of JIT’d code and enabling JIT’d code and CLR data structures to be completely shared across application process. We are also working on tools that will help application developers gain visibility into the performance characteristics of running applications. 

Mike.

This posting is provided "AS IS" with no warranties, and confers no rights.