How are value types implemented in the 32-bit CLR? What has been done to improve their performance?

Article
11/02/2007

By Fei Chen

How are value types implemented in the 32-bit CLR?

Value types are the closest thing in the common language runtime model to C++ structures. An instance of a value type is simply a blob of data in memory that contains all the fields in the instance. The main difference between an instance of a value type and an instance of a reference type is that the former does not contain the type ID in its blob (see the example below), because the type information for value types is only needed at compile time.

struct PointStruct() { int x; int y; } // The memory needed for the instance of this value type is 8 bytes, with 4 bytes for each integer field.

class PointClass() { int x; int y; } // The memory need for the instance of this reference type is 12 bytes, with the first 4 bytes containing the type ID, followed by 4 bytes for each integer field.

Being a contiguous blob of data in memory, value type instances are referenced internally in the CLR using the pointer to the beginning of the blob of memory.

A value type instance can live in one of two different places – a value type local variable, or a value type field in a reference type[1]. When a value type local variable is declared, the prolog of the jitted code reserves a piece of stack memory[2] (large enough to hold the instance of this value type), and the pointer to this stack location is used in all the places in the jitted code where this local variable is referenced. In the case of a value type field embedded in a reference type, the memory on the heap for this object contains the memory needed for its value type field. See this example:

class PointWithColorClass() { int color; PointStruct point; }

The size of the above object is 16 bytes. The first 4 bytes is the type ID; the next 4 bytes is the color field; and the last 8 bytes is for the point value type field.

The pointer to the beginning of point field is used in all the places in the jitted code where this field is referenced. (Note that this time the pointer points to a location on the heap, instead of the stack.)

Common operations on value types

There are only a few common operations on value types. Here is how they are implemented internally.

· Field access

Given that a value type instance is referenced by the pointer to the beginning of its blob, accessing its fields is nothing more than adjusting this pointer with the corresponding field offset. In other word, if a value type local variable p of type PointStruct lives at [EBP-8], then a “mov eax, [EBP-8]” instruction reads the field x and a “mov [EBP-4], eax” instruction writes the field y.

· Initialization

Zero initialization of a value type instance is done by calling memset on this piece of memory with 0s.

· Assignment

Assignment from an instance of a value type to another is done by calling memcpy between these two pieces of memory.

· Calling the instance method

CLR supports calling instance methods on value types. This is internally done by passing the pointer to the instance as a first parameter to the target method. This should sound similar to people who are familiar with C++ instance method calls, where the “this” pointer is passed as the first parameter.

Since JIT owns the code generation for both the caller and the callee methods, it knows how to generate correct code for the value type instance method. In other words, it expects the first parameter to be the pointer to the blob of the value type instance.

· Passing as an argument by-value

Passing a value type instance as a by-value argument requires making a stack copy and then passing the pointer to this copy to the target method. Consider what needs to be done at the call site of Foo() in the following example:

static void Foo(int i, string s, PointStruct pointArg) { … }

static void Main() { PointStruct point; Foo(1, “one”, point); }

What happens at the call site can be described using the following pseudo code in C++ syntax:

PointStruct stackCopyOfPoint; // This is a stack local variable.

stackCopyOfPoint = point; // (or think of it this way) memcpy(&stackCopyOfPoint, &point, 8);

Foo(1, “one”, &stackCopyOfPoint);

The stack copy is necessary for maintaining the by-value semantics so the callee only sees the copy and hence has no way to affect the original one.

· Passing as an argument by-reference

Passing a value type instance as a by-reference argument is easy. Just pass the pointer. Now the callee and the caller see the same instance. So any change done inside the callee will affect the caller.

· Returning a value type

Returning a value type requires the caller to provide the storage. The pointer of the return storage buffer is then passed in as a hidden parameter to the callee. The callee is responsible of filling in this buffer. Consider this example,

static PointStruct Bar() { … }

static void Main() { Bar(); }

What actually happens at the call site can be described using the following pseudo code:

PointStruct tempPoint; // A temporary stack local created to hold the return value.

Bar(&tempPoint);

In the case of an embedded value type field, consider this example:

static PointStruct Bar() { … }

static void Main() { PointWithColorClass obj; obj.point = Bar(); }

What actually happens at the call site is:

Bar(&(obj.point));

Inefficiencies in the code generation with regards to value types in .NET 2.0

Code generation for value types in .NET 2.0 has several inefficiencies.

1) All value type local variables live entirely on the stack.

2) No assertion propagation optimization is ever performed on value type local variables.

3) Methods with value type arguments, local variables, or return values are never inlined.

While the original intent of supporting value types in the CLR was to provide a means for creating “lightweight” objects, the actual inefficiencies in the code generation make these “lightweight” objects not-so-light.

For bullet 1), the following code would mean 3 stack operations, one for each field access:

static void MyMethod1(int v) {

PointStruct point; // point will get a stack location.

point.x=v; point.y=v*2;

Console.WriteLine(point.x); // All 3 field accesses involve stack operations.

}

Wouldn’t it be nice if the jitted code stored both fields of point into registers and avoided allocating stack space for this value type local variable altogether?

For bullet 2), the following code would mean 19 useless memcpy’s.

static void MyMethod2() { PointStruct point1, point2, …, point20; point1.x = point1.y = 5;

point2 = point1; point3 = point2; … point20 = point19;

Console.WriteLine(point20.x + point20.y); }

Wouldn’t it be nice if the JIT could apply copy-propagation to these value type local variables and morph the above code to “Console.WriteLine(point1.x + point1.y)” instead?

For bullet 3), a simple field getter of a value type turns into an expensive method call:

struct PointStruct() { int x; int y; public int XProp { get { return x;} } }

static void MyMethod3() { PointStruct point; point.x = point.y = 5;

Console.WriteLine(point.XProp); } // point.XProp is a method call which is never inlined.

Currently the JIT does not perform any assertion propagation to local variables whose addresses have been taken. Common operations on value types, however, do involve taking their addresses.

Improving value type code generation in CLR v.Next

Improving code generation with regards to value types has always been a top customer ask according to MS Connect: https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=93858.

Over the past year or so, the JIT team has been working on significant improvements to value type code generation, as well as the inlining algorithm. In summary, all of the above limitations are being eliminated.

The new inliner will allow inlining methods with value type arguments, local variables or return value. This solves the issue in bullet 3).

An algorithm called “value type scalar replacement” has been implemented to address the issues in bullets 1) and 2). This algorithm is based on the observation that a value type local variable logically can be viewed as a group of independent local scalars, each representing a field in this value type local variable, if

a) there is no operation in the current method that causes any interaction between these fields;

and

b) The address of this value type local variable is never exposed outside of the current method.

When the above conditions are met, the MyMethod1() listed above can be safely transformed to

static void MyMethod1(int v) {

int x; int y; // Was “PointStruct point;”. Now replaced by x and y.

x=v; y=v*2; Console.WriteLine(x);

}

by replacing the value type local variable point with a group of independent integer local variables, namely x and y.

And the MyMethod2() listed above will be transformed to

static void MyMethod2() { int x1, y1, x2, y2, …, x20, y20; x1 = y1 = 5;

x2 = x1; y2 = y1; x3 = x2; y3 = y2; … x20 = x19; y20 = y19;

Console.WriteLine(x20 + y20); }

Furthermore, the assertion propagation algorithm and the constant folding algorithm will be applied to these scalars, since none of them have their address taken. As a result, the code will be reduced to:

static void MyMethod1(int v) { Console.WriteLine(v); }

static void MyMethod2() { Console.WriteLine(10); }

In addition, the register allocation algorithm will home the local variable v into a machine register, so no stack operation will occur in MyMethod1().

Not all value type local variables can be replaced by scalars, however. Local variables with their address taken, and exposed outside of the current method, cannot be replaced. Consider this example where SomeBigMethod() is an instance method in PointStruct that is not inlined.

static void MyMethod4() { PointStruct point; point.SomeBigMethod(); }

The address of point is taken and passed as the “this” pointer to SomeBigMethod(). What SomeBigMethod() does with this pointer is totally out of the control of MyMethod4(). In this case, point is not replaced by scalars. Another way to expose the address of a value type local variable is to pass it as a by-reference argument to another method. Taking the address of a value type local variable and storing it in a static variable, or in an object, also exposes the address.

The JIT in CLR v.Next will be able to perform value type scalar replacement optimization on the following kinds of value types whenever it thinks it will be beneficial:

1) The value type contains no more than 4 fields.

2) The types of the fields in the value type are either primitive types or object references.

3) The value type must be using [StructLayout(LayoutKind.Sequential)], which is the default.

Guidelines forusing value types in the CLR

The decision around whether to use value types, or not, should be based primarily on the semantics of the program. Value types should be used when the pass-by-value semantics are the most natural, and the most frequently used in the program.

After the decision has been made to use the value type, it is time to think about the performance implications, and to determine how to help the JIT generate the best possible code. Always keep in mind that the by-value nature of value types means that a lot of copy operations might be happening under the covers. Also, nearly every operation related to a value type will be a memory operation (either operated on the stack or on the heap) if this value type is not replaced by scalars.

Developers should examine the jitted code of their hot methods under the debugger to make sure the value type stack local variables are indeed homed in registers.

Try not to create value types that contain more than 4 fields. Try not to create non-inlineable value type instance methods and call them in the hot path, because doing so will cause the address to be exposed. When a temporary value type instance is needed, try using value type local variables rather than the value type fields embedded in an object, because the latter are never replaced with scalars.

[1] For simplicity, let us ignore the case where a value type field is embedded into another value type.

[2] This is not true with the value type scalar replacement optimization newly implemented in CLR v.Next, as described later in this document.

How are value types implemented in the 32-bit CLR? What has been done to improve their performance?

Additional resources