I thought it would be useful to provide a
primer on the NGen tool and pre-jitting your code for performance reasons.
In particular, there are some gotchas you must be aware of when authoring your
product. In this entry, I’m going to cover some background material on
paging (which you can skip if you are an expert
already). Then we’ll cover the workings of the
NGen tool, some
servicing implications, and finally some future
Before we get started, let me keep up a Microsoft tradition
and include the key takeaways right here. If
you get nothing else out of this topic or can’t read the whole thing, make sure
you absorb the following:
uses a virtual address space on your machine, so for a 32-bit system you get
from 0 to 4GB of addressable memory for each process. Windows code is typically compiled into a Portable
Executable file (PE file), which contains sections of code and data marked with
page attributes like read, write, and execute. When the OS loads such a
file into a process, it maps the memory from your file into physical pages that
can be addressed by the process. So far so good?
On the x86, calls to methods are typically in
the form of "call address", where address is an absolute
value from 0 to 4GB, and tells the CPU the precise location it should transfer
to. This poses a problem for the compiler, because it means that when the
user’s file is loaded, it needs to know precisely where all of the methods it
will call inside that file live (not just relative to the start of the file, but
the absolute address in the entire process). There are two things that
kick in here to aid you:
This is the address you
Just in case your file can’t be
Besides allowing the compiler to stitch
together your program, a base address gives you a predictable location for your
file to get loaded every time it is executed. This is important, because
if you have sections of the file (say all of your executable code) that are read
only, then we’d like to be efficient as possible on the machine and share those
pages between processes. The OS accomplishes this if your pages are marked
for read only and sharable. So if you have the code for strcpy from msvcrt.dll at
location 0x70124800, then the one page of physical memory where that code lives can be
viewed in all
of the processes on the machine that also need it, provided those process have
loaded the msvcrt.dll to the same address.
|User Process 1||Kernel Mapped Pages||User
See the advantage? Overall system memory
pressure goes down with shared pages because only one physical page is used no matter how many
times you load it. Also, speed of loading code goes up, because chances
are the system already has that file loaded in some other process on the
machine. This is typically referred to as a "warm startup", because the OS
has already loaded many of the pages you need, and doesn’t have to
go out to disk to get them. So bottom line, sharing of pages between
processes is a GOOD THING.
I mentioned that having to relocate a file away
from its base address is a BAD THING for your shareable data. This loss of sharing is the reason.
If you cannot load at your preferred base address, then those addresses in those
otherwise sharable pages are now wrong. So the OS has to make a copy of
the page for your process, mark it write, and then fix-up all of the invalid
values. This is bad because it takes both more time to do this (slower load
times) and more space (for the extra unshared pages).
I should point out that some pages are, of course, intended
to be per-process. Your global data for example wouldn’t make much sense
if you were sharing it with another running instance of your application!
But in general we try very hard to reduce the number of pages in the system
because of the high cost of the extra memory pressure.
Back to Managed
Ok, all of this background is interesting, but
what does this have to do with Managed code and the CLR? First, we also
use the PE file format for managed code, so your VB.Net application will be
stored in the same file format as kernel32.dll. This allows managed
executables to appear anywhere you would normally expect. For example if you want to do a CoCreateInstance on your managed code, or do a LoadLibrary directly,
you can do so.
This file format choice means we have to follow the same rules for assigning base addresses. And
guess what? We made the metadata and IL your compiler generates read only +
sharable so we could use the same memory management benefits you get with
Now think about what the JIT compiler does for
a minute. It just-in-time compiles your program one method at a time. That
means we allocate, on the fly, some memory and write the necessary native code
for your program out to
that location. When we need to call a method, we know where we put it in
the absolute address range, so we can do the same
"call address" you saw
for unmanaged code. The advantage of the JIT is that it can literally
stitch your program together as you go, and it only compiles the code that
you actually execute. But since this is happening on the fly, all of those pages where this code is allocated
are for that process only. We get none of the sharing advantages
you got with unmanaged code in read only + sharable pages, and it also takes
time to run that compiler. We did some experiments early on in the Runtime
as proof of concept for our managed C++ compiler which included recompiling Word as an
IL image. It worked great! But it was slow. Office is a big
application, and using the JIT for this case didn’t put our best foot forward.
|User Process 1||Kernel Mapped Pages||User
Wouldn’t it be great if you could get the same
page sharing advantage as unmanaged code, and not have to run the JIT every time
for a big application like Office? That’s the NGen tool, and we’ll drill
into that in the next section.
NGen stands for "Native Image Generator".
The tool allows us to run the JIT compiler on all of your IL in an assembly (a
PE file) at one sitting, and cache the results out to disk. Now when you
want to load and run that assembly, we can find it in the cache and load it just
like an unmanaged image. Because the code is read only + sharable, you get
the same benefits of page sharing.
So what precisely is in that image that gets created?
Let’s look at the contents:
All PE files contain the standard set of headers,
Obviously this is the key thing we are trying to get into the image, and
does make up the bulk of the image size. The code persisted is
100% native at this point, so that the JIT does not need to get involved
to execute it.
|No Metadata or IL||
The current NGen produced image does not have a copy of the metadata or
the IL in it. This is significant, because it means that you will
need to have both the original IL Assembly and the NGen
image loaded at once. In general we try to avoid touching the metadata and IL
at runtime, but you can’t always avoid it. Two examples are late
bound programming (eg: Reflection, which needs name information form the
metadata) and JIT’ing of non-NGen’d code (the IL is read to see if it
can be inlined).
The CLR requires more than just code to execute. It must have
access to key data structures which describe things like Class and
Method layouts. These are only known at runtime. We want to
reduce overall writable pages in a process. To accomplish this,
NGen stores a table of pointers to this data which will be allocated at
run time. This allows NGen
to generate one version of the code that will work unmodified for all
processes, because there is a predictable location in the image where
you can find the pointer to the dynamically allocated data (essentially
a slot in this table). However, this technique has the down side
of (1) slowing down startup to fill out the table, and (2) generating
sub-optimal code which must use a pointer indirection to get the data it
needs. Finally, it also
means that we cannot simply persist the output of the JIT compiler
itself while it runs, because the actual code generated is different in
the two scenarios.
Even with some of the trade
offs mentioned here, we’ve seen some remarkable performance
wins from this technique (and it only gets better each new release). There are, however, some
considerations you need to make before you jump on board the NGen bandwagon.
We’ll cover those now.
Measure, measure, measure. You
To Cache or Not to Cache
Generating an NGen image takes time.
The NGen To Do List
So you’ve decided to ngen your image. Now
what? This section contains some steps you should be taking:
|Start with MSDN||
Make sure you read all of the
MSDN. I will only pull out highlights here.
Picking a Base Address
Pick a good set of base addresses for
When to NGen?
You need to pick when you want to
When you release bug fixes to
Remember to use ngen /delete to
As mentioned above, there are brittleness
issues with ngen in V1.0 and V1.1 (aka Everett). So you need to plan out
what you will do in the face of those things changing. As an example, we
will release a service pack of the CLR at some point, and your cached ngen
images will no longer load. Your code will still work, but it will run
under the jitter which will be slower (you did measure to verify you needed ngen,
Right now fixing this is tricky. Expect
us to improve this situation in the future, but for
now, here are some ideas on how you can address this:
Make sure your setup and patching
|You can periodically
run a scheduled task to check your images and re-ngen them as required.
If you already have some kind of nightly enterprise script running on
client machines, as an example, this would be a fine time to do
maintenance. Note: if your images are already up to date,
the NGen tool will simply report that and exit instead of doing a lot of
|Rocket Science||If you are really motivated, you could go find the
list of natively loaded PE files in your process (use the Win32 PSAPI
API or walk the PEB) to see if your NGen’d image was actually loaded in
the process. If it wasn’t, most likely it means you need to fix it
up, and your app could do so for the next run. I might prototype
this at some point, but suffice to say it isn’t a trivial thing to do.
At this point you’ve probably looked through the list of
Servicing Hints and thought to yourself:
"Wow that’s kinda ugly!" And you’re right. NGen for Version 1.0 and
1.1 was primarily designed and engineered for internal use by the CLR itself.
When we install SP’s of our stuff, we force a re-ngen of all of the core
components, which keeps that part of your app running fast.
Going forward, Ngen is still a
key foundation for our performance story. It gives you the working set wins (better page
sharing, quicker loading) that are required for starting your application
faster. It also allows for more aggressive optimizations in the compiler.
If we tried doing really aggressive optimizations every time you ran the JIT,
you’d actually run slower just waiting for the compiler to finish.
Expect in the future that we will be addressing the
clumsiness and the servicing issues so your life is easier. Here just a
few things we’re thinking about:
There are some cleaner ways we could expose the fact that your
application is out of date. It would make it simpler to write your
app if it could query this state, or force it to correct automatically.
We are considering these designs now.
As mentioned above, the current CLR loads both the IL image and your NGen
image in the process. This double loading is inefficient
because it makes the OS loader do more work (slower startup time).
Look for us to try to avoid this in the future.
As mentioned above, NGen images still contain a lot of fix-up tables for
dynamic data structures. This causes the startup to be slower
(while those tables are fixed up) and generates sub-optimal code which
must use the indirection of the table to get at the data. Look for
us to get more aggressive and avoid a lot of this.
And finally in closing, make sure to
re-read those key take aways.
There are some important links you may be interested in