Shared Bytes, Private Bytes and Fixups

This post is actually a re-post of a post I did a little under year ago during PDC '05 after attending a talk by Rico Mariani and chatting with him afterwards.  The original post is here.

The reason I thought to re-post this is I was talking to a couple of fellow Cider-tonions about some perf analysis work that was being done and I figured it would be a useful thing to share.  Here it is:

Shared Bytes vs Private Bytes
A shared byte is one that can be shared across multiple processes. A private byte cannot. So what bytes qualify as shareable? Unaltered pages of a dll where the backing file for that dll is not the page file but the dll itself.

Anytime a page of code or data from a dll needs to change for a particular process, it is marked dirty and becomes private. This can occur on pages that have references to objects on the heap, pages that have code offsets that get modified after load time, or on pages in the data segment that change.

A quick way to have almost all of your dll marked private is to have it rebased during load time.

This concept mainly applies to NGEN'd images as non-NGEN'd images mainly consist of IL that will be JIT compiled on the loader heap and therefore will all be private. The IL is shared, but it is discarded after JIT anyhow. [addition: if a non-NGEN'd assembly gets rebased, the whole assembly will still be loaded and checked for fixups which can be a performance hit]

It's important to realize that the benefit of shared bytes are just that: that they can be shared across multiple processes and their cost can be amortized across the number of processes using those bytes. For assemblies like the .Net Framework assemblies, having shared bytes is a clear benefit. That said, in non-sharing cases, optimizing for shared bytes may sub-optimize your code so caveat emptor.

Fixups
The term "fixups" refers to "fixing up" an address in the code to a new address. Since this modifies a page for a given process, any page that has a fixup, also becomes private.

Reducing Private Bytes
The following are some ways you can reduce the number of private bytes:

  • put code that will be private together such that the number of pages that need to be marked private is decreased
  • prevent rebasing
  • fix addresses so that they don't become a fixup

String Freezing and Hard Binding
String Freezing and Hard Binding are both great examples of how the concepts above are being applied.

In .Net 2.0, you can mark an assembly such that it will freeze its string literals. What that means is that all of the strings in that assembly will be put in a separate segment (during NGEN) so that all of the references to string literals will not require a fixup.

The reason strings need fixups is because the literals need to be wrapped in a string instance and the code then points to those instances.

With string freezing, there is a benefit in that the string isn't duplicated in the literal and the string instance as well as the reduction in private pages. Note as well that those string instances are interned (with opt in/opt out in whibey see the article here) to avoid duplication.

The downside of string freezing is that such an assembly cannot be unloaded -- because the reference to that string now resides in a segment of that dll instead of on the heap and code in other assemblies may be depending on that reference.

Hard binding refers to another new NGEN feature where NGEN assumes that a reference from one assembly to another is always going to be there. That allows NGEN to hard code offsets from one assembly to another reducing the need for load time fixups.

The downside is that any assemblies that are hard bound will be loaded at the same time as the referencing assembly. This is in order to guarantee that the desired load addresses are gotten.