Implicit Boxing

What s Different in the Revised
Language Definition?

Implicit Boxing



"urn:schemas-microsoft-com:office:office" />Ok,
so we reversed ourselves. In politics, that would likely loose us an election. In
language design, it means that we imposed a philosophical position in lieu of practical
experience with the feature and, in practice, it was a mistake. As an analogy, in
the original multiple inheritance language design, Stroustrup decided that a virtual
base class sub-object could not be initialized within a derived class constructor,
and therefore the language required that any class serving as a virtual base class
must define a default constructor. It is that default constructor that would be invoked
by any subsequent virtual derivation.


The problem of a virtual base class hierarchy is
that responsibility for the initialization of the shared virtual sub-object shifts
with each subsequent derivation. For example, if I define a base class for which initialization
requires the allocation of a buffer, the user-specified size of that buffer might
be passed as an argument to the constructor. If I then provide two subsequent virtual
derivations, call them inputb and outputb, each provides a particular value to the
base class constructor. Now, when I derived a in_out class from both inputb and outputb,
neither of those values to the shared virtual base class sub-object can sensibly be
allowed to evaluate.


Therefore, in the original language design, Stroustrup
disallowed the explicit initialization of a virtual base class within the member initialization
list of the derived class constructor. While this solved the problem, in practice
the inability to direct the initialization of the virtual base class proved impracticable.
Keith Gorlen of the National Institute of Health, who had implemented a freeware version
of the SmallTalk collection library called nihcl, was a principle voice in convincing
Bjarne that he had to come up with a more flexible language design.


A principle of Object-Oriented hierarchical design
holds that a derived class should only concern itself with the non-private implementation
of its immediate base classes. In order to support a flexible initialization design
for virtual inheritance, Bjarne had to violate this principle. The most derived class
in a hierarchy assumes responsibility for all virtual sub-object initialization regardless
of how deep into the hierarchy it occurs. For example, inputb and outputb are both
responsible for explicitly initializing their immediate virtual base class. When in_out
derives from both inputb and outputb, in_out becomes responsible for the initialization
of the once removed virtual base class, and the initialization made explicit within
inputb and outputb is suppressed.


This provides the flexibility required by language
developers, but at the cost of a complicated semantics. This burden of complication
is stripped away if we restrict a virtual base class to be without state and simply
allow it to specify an interface. This is a recommend design idiom within C++. Within
C++/CLI, it is raised to policy with the Interface type.


Here is a real code sample doing something very simple
– and in this case, the explicit boxing is mostly a lexical tax without representation.


original language requires explicit __box operation

int my1DIntArray __gc[]
= { 1, 2, 3, 4, 5 };

myObjArray __gc[] = {


















As you can see, there is a whole lot of boxing going
on. Under T2, value type boxing is implicit [note that all T1 to T2 translations are
output of the mcfront tool]:


revised language makes boxing implicit

my1DIntArray = {1,2,3,4,5};

array<Object^>^ myObjArray
= {26,27,28,29,30};


“{0}\t{1}\t{2}”, 0,

0 ),

0 ) );



Boxing is a peculiarity of the .NET unified type
system. Value types directly contain their state, while reference types are an implicit
duple: the named entity is a handle to an unnamed object allocated on the managed
heap. Any initialization or assignment of a value type to an Object, for example,
requires that the value type be placed within the managed heap this is where the image
of boxing it arises first by allocating the associated memory, then by copying the
value type s state, and then returning the address of this anonymous Value/Reference
hybrid. Thus, when one writes in C#


object o = 0; //
C# implicit boxing


there is a great deal more going on than is made
apparent by the simplicity of the code. The design of C# hides the complexity not
only of what operations are taking place under the hood, but also of the abstraction
of boxing itself. T1, on the other hand, concerned that this would lead to a false
sense of efficiency, puts it in the user s face by requiring an explicit instruction,


*o = __box( 0 ); //
T1 explicit boxing


as if in this case one had any choice, or that it
particularly matters when one is invoking Console::WriteLine. In my opinion, forcing
the user to make an explicit request in these cases in at best the equivalent of one
s mother repeatedly demanding as one is trying to leave the house, now you will be
careful, won t you? Or, if you like, the child in the back seat asking, five minutes
out from the house, are we there yet? In both cases, we are not questioning the sincerity
behind the intent. And that is why boxing is implicit under T2:


Object ^o = 0; //
T2 implicit boxing


There are side-effects to implicit boxing, of course.
One of which being that the above initialization is not setting the object to null,
but to address a boxed instance of the integer value zero. This requires the introduction
of some entity that can represent a tracking handle to no object. Everyone s original
choice, of course, is null, and lucky for C# they could start from scratch and introduce
just such a keyword. Adding a paradigm to an existing language presents a few more
constraints think of the somewhat analogous problem of turning fins into legs, or
introducing lungs, as marine life moved onto land. In any case, my original choice
was the refnull, which one can t champion
with any real enthusiasm, and that has evolved over a year and a half into
as in:


Object ^o = nullptr; //
T2 initialize tracking handle to refer to no object


[A T1 to T2 translation Head s Up] As I mentioned
in an earlier post, this presents something of a bother for those moving their code
from T1 to T2 since all comparisons and assignment/initialization of 0 change semantics
because of implicit boxing. mcfront, the translation tool, attempts to automagically
do the right thing, but certain cases such as calling an overloaded method requires
a great deal of type analysis that goes beyond the original scope of the tool which
consists of a parse engine, abstract syntax tree hierarchy (called an MCTree), and
a tree-walker (called an Ent) to generate the T2 source-level code. It s just a question
of having sufficient time to add the necessary type checking semantics, which are
not an aspect of the parse engine component. If you are doing the transition by hand,
it is something you need to watch out for.  [End
of Head s Up]


Let me conclude by putting this is the context of
our two earlier metaphors of (1) Kansas and Oz as representing native and managed
Object Model behaviors, and (2) the two-faced Janus image as representing the design
face of C++/CLI.


  1. T1 did not provide implicit boxing. Why? Simply put,
    we were thinking


    , not Oz. We were looking through the wrong Janus pair of eyes. And this resulted
    in an inelegance and sense of complexity for our users. T2 addresses this imbalance.

  1. On the other hand, T1 did provide direct access of
    the boxed value on the managed heap, since the alternative is not acceptable performance-wise.
    The lesson here is that without some


    thinking — without using that set of Janus eyes — system programming is not practicable.

For example, if one were to write a simple word-counting
program that represents a map where the word is represented as a string key and the
count as an integral value, then each increment of the count requires a downcast unboxing
of the existing value and subsequent reboxing of the new value into a new heap object.
Languages that have no performance characteristics are quick to ridicule performance
concerns and usually stoop to the mockery of saying, it hardly matters how fast a
program is if it simply returns an incorrect value faster, suggesting that concerns
with performance lead to bad programs and, even worse, bad programmers. Implicitly,
these people are condemning C and C++, and using that condemnation to promote the
sales of their languages. But here is an example of where performance and correctness
are not adversaries, but rather partners in providing a street savvy system’s
programming language for .NET


disclaimer: This posting is
provided “AS IS” with no warranties, and confers no rights. The opinions expressed
are those of the author. 



Comments (14)

  1. igor f says:

    Interesting stuff, thanks for the insights.

    The discussion of "nullptr" and boxing "0" speaks to an issue that I’ve been continually wondering about as I’ve been reading this series of blogs: what is the planned backwards-compatibility story of the next version of Managed C++? Specifically, will the new compiler continue to support (perhaps optionally) the original MC++ syntax (__gc, __value et al), or will all MC++ code need to be re-written or translated to the new syntax? Initially the prospect of translation didn’t seem all that onerous, but if the new compiler does in fact support implicit boxing, the process will become more difficult and error-prone, as you point out. Unfortunately implicit boxing also would seem to complicate the backwards-compatibility scenario.

    I’d love to know how you anticipate this working out.

  2. Garrett Serack says:

    Igor ->

    It appears that they are introducing this new C++/CLI Spec in addition to the existing MC++ one, in order to go abit farther with the language.

    There is a compiler flag (/Z:oldsyntax or something) that will let you continue to use MC++ style language. You can mix and match in a project, but a single file needs to be either one or the other.


  3. Andreas Häber says:

    Just curious.. you wrote:
    "of the tool which consists of a parse engine, abstract syntax tree hierarchy (called an MCTree), and a tree-walker (called an Ent)"

    does the name for the tree-walker (Ent) come from Tolkien’s ents in Lord Of The Ring? Cool way to name it 🙂

    btw. I believe I read in a blog somewhere that the switch for using MC++ is /clr:oldsyntax. Using /clr you’ll get "the new way".

  4. Andreas Häber says:

    Regarding the switch for using MC++… Andy Rich wrote about that here:

  5. Srdjan says:

    Interesting thing is that ‘nullptr’ is choosed to be a keyword representing null _managed reference_ [sic!]
    In my book, ptr is a pointer…

  6. Gil says:

    A question on something I just saw in the code:

    array<int>^ my1DIntArray = {1,2,3,4,5};

    What is the type of the expression {1, 2, 3, 4, 5}. If it is int[] then somebody must be doing an implicit conversion here? Surely not the CLR, so is it the C++ compiler. If the expression is array<int>^, then that is a big compatibility problem (so I assume that it isn’t).

  7. AlisdairM says:

    nullptr is also a proposal to ANSI/ISO for the next C++ standard, C++0x, as the literal value for null pointers. On the assumption that goes through, I am more than happy to see the same reserved word used in C++/CLI for effectively the same purpose. After all, we call them managed references but syntactically they are much closer to C++ pointers than C++ references.

  8. stan lippman says:

    igor asks:

    Specifically, will the new compiler continue to support (perhaps optionally) the original MC++ syntax (__gc, __value et al), or will all MC++ code need to be re-written or translated to the new syntax? … I’d love to know how you anticipate this working out.

    1. the compiler will continue to support the old syntax, with a flag.
    2. i am currently developing a translation tool which parses the old syntax and in intention at least translates both the syntax and semantics to the new language, or else generates a warning when such a translation may not be implemented. for example, in the old language, one could pin a whole object, and then pass the address of one or more members into the native space. in the new language, pinning a whole object is not supported. the ideal translation would be to recognize each interior address and backpatch a pin declaration. at the moment, that is on the stack to do. the actual details of how the tool will be deployed are unclear at the moment.

    our goal is to provide a first class transition experience. if you have anything less, you should let us know loud and clear.

  9. stan lippman says:

    Interesting thing is that ‘nullptr’ is choosed to be a keyword representing null _managed reference_ [sic!] In my book, ptr is a pointer… Srdjan |srdjanjAT NOSPAMmicrosoft dot com

    as alisdairM points out, nullptr is a proposal to the ANSI/ISO, and represents a joint authorship of Bjarne Stroustrup and Herb Sutter, who is leading the C++/CLI language effort. We originally did not call it nullptr for the C++/CLI, but for the unification, although we agree nullptr is not absolutely accurate, getting convergence with ISO C++ is worth the small misnomer.

  10. stan lippman says:

    A question on something I just saw in the code:

    array<int>^ my1DIntArray = {1,2,3,4,5};

    What is the type of the expression {1, 2, 3, 4, 5}.

    *** i’ll address the array revision in a subsequent blog, and address this then. it is a shorthand notation supported in the original language design.