Boxing (value types, not de la Hoya)

Grr. Discussions of pugilism aside, I'll try posting again. I had a beautifully written post, with just the right amount of comedy, sure to inform, delight, and be nominated for a Pulitzer, and then I had a slight power glitch, and it took my beautiful, awesome, cool post. Just read the one below, and imagine that it's full of wit, charm, and information.

A brief background. Previously, I've discussed value types (and then corrected myself), and I mentioned boxing, but only peripherally. Basically, all I said was that there's a hidden cost with boxing. I didn't bother explaining it, but I did promise I'd try to later. In Managed Extensions, boxing was explicit, and exposed through the pointer-modifying keyword __box. We physically warned you about the extra cost of boxing, by making you type that extra stuff to do it. But that really isn't in the style of C++. When the language designers strove to make the C++/CLI binding in Whidbey a first-class language, one of the things that changed was to make boxing implicit. It looks cleaner, but it hides a lot of stuff under the hood. As is always the case in C++, we give you the gun, we load it with bullets, and we show you where your foot is. You might have a valid reason for pointing a loaded firearm at your foot (other than shooting it off), so we give it to you.

Why do value types need to be boxed? Value types, by design, are intended to be lightweight stack-based or class member variables. Basically, value types are the type that the CLR uses internally to represent fundamental types (think ints), that it also exposes to us for similar implementation purposes. In Whidbey C++, we've expanded their usability a bit, but their real intended use is somewhat limited, albeit vital. There are cases, however, where a user will want to place a value type by itself onto the GC heap. The user could wrap their value type inside a ref type, and place this “proxy” ref type on the heap, but this activity is common enough that the CLR has implemented it for users. This is called boxing, and it adds cost to perform this wrapping.

How much cost could it really add? This is one of those cases where the trivial example won't really make it noticeable. But I can recall an example in recent history from a presentation I attended concerning Managed DirectX. They mentioned a case where they had serious performance issues with a graphics demo, and discovered the reason was because they were boxing and unboxing literally millions of value types per frame in the example. It was running, but incredibly slowly. One or two value types being boxed won't be noticable, but increase that by a few orders of magnitude, and suddenly you could have a real problem on your hands.

Sheesh! I'm never boxing anything! Now, I am warning you quite a lot, but that doesn't mean there aren't justifiable uses for boxing value types. It is just important to be aware of the dangers associated with boxing, and to remember to use it judiciously. That said, it might be useful to know when you are going to box something (since it is implicit).



So, when does boxing occur?

Boxing occurs whenever a value type is placed on the GC heap. (Caveat: only while outside of another object. If it's inside a ref type, you're safe.) It might be useful to think of boxing in terms of the types that can cause boxing to occur. Basically, to hold a boxed type, you need a variable of the type “handle to a value type“. For example, an int^:

int^ sum(int^ a, int^ b, int^ c){

      return int(*a+*b+*c);

}

int main(){

  int^ x = sum(200, 300, 400);

  return *x;
}

How many boxings did you count? One? Two? How about four? It may not appear to be that way, but look closely at the types of the parameters of sum. Those parameters are of type int^, and I've passed in regular int's. So, the compiler has to go ahead and box those three values (200, 300, and 400) before passing them to sum. It used to be that you were required to be explicit about doing this, and so you understood the cost, but it made for some really nasty code. Take a look at the same code, in the syntax of Managed Extensions:

#using <mscorlib.dll>

using namespace System;

int __box *sum(int __box *a, int __box *b, int __box *c){

return __box(*a + *b + *c);

}

int main(){

int __box *x = sum(

__box(200)

,

__box(300)

,

__box(400)

);

return *x;

}

Ouch. You can see why we made boxing implicit. I think I broke my finger typing that example in.

 

But that's not all! It doesn't stop there: unboxing isn't free, either. And there are four unboxings in my little example (every time I dereference an int^, I incur an unboxing cost). And there's one more dangerous wrench in the works, especially for those of us who like to use a literal zero to represent the null value for pointers:

int* x = 0; //sets x to null

int^ y = 0; //boxes the integer value zero and puts it inside y

int^ z = nullptr; //sets z to null

if(!x && !y && !z){

//you won't get into this code branch!

}

Use nullptr. It's a new Whidbey keyword for the null value. Using literal zero will produce some unintentional results, like the one shown above. You also get this undesired behavior for assigning zero to any System::Object^'s.

Whew. There's an introduction for the concept of boxing, and how we present it in Whidbey C++. It is powerful and often useful, but if you're not aware of the extra hidden costs, you can really cause yourself some undue pain.