Implicit Boxing

What s Different in the Revised
Language Definition?

Implicit Boxing

"urn:schemas-microsoft-com:office:office" />Ok,
so we reversed ourselves. In politics, that would likely loose us an election. In
language design, it means that we imposed a philosophical position in lieu of practical
experience with the feature and, in practice, it was a mistake. As an analogy, in
the original multiple inheritance language design, Stroustrup decided that a virtual
base class sub-object could not be initialized within a derived class constructor,
and therefore the language required that any class serving as a virtual base class
must define a default constructor. It is that default constructor that would be invoked
by any subsequent virtual derivation.

The problem of a virtual base class hierarchy is
that responsibility for the initialization of the shared virtual sub-object shifts
with each subsequent derivation. For example, if I define a base class for which initialization
requires the allocation of a buffer, the user-specified size of that buffer might
be passed as an argument to the constructor. If I then provide two subsequent virtual
derivations, call them inputb and outputb, each provides a particular value to the
base class constructor. Now, when I derived a in_out class from both inputb and outputb,
neither of those values to the shared virtual base class sub-object can sensibly be
allowed to evaluate.

Therefore, in the original language design, Stroustrup
disallowed the explicit initialization of a virtual base class within the member initialization
list of the derived class constructor. While this solved the problem, in practice
the inability to direct the initialization of the virtual base class proved impracticable.
Keith Gorlen of the National Institute of Health, who had implemented a freeware version
of the SmallTalk collection library called nihcl, was a principle voice in convincing
Bjarne that he had to come up with a more flexible language design.

A principle of Object-Oriented hierarchical design
holds that a derived class should only concern itself with the non-private implementation
of its immediate base classes. In order to support a flexible initialization design
for virtual inheritance, Bjarne had to violate this principle. The most derived class
in a hierarchy assumes responsibility for all virtual sub-object initialization regardless
of how deep into the hierarchy it occurs. For example, inputb and outputb are both
responsible for explicitly initializing their immediate virtual base class. When in_out
derives from both inputb and outputb, in_out becomes responsible for the initialization
of the once removed virtual base class, and the initialization made explicit within
inputb and outputb is suppressed.

This provides the flexibility required by language
developers, but at the cost of a complicated semantics. This burden of complication
is stripped away if we restrict a virtual base class to be without state and simply
allow it to specify an interface. This is a recommend design idiom within C++. Within
C++/CLI, it is raised to policy with the Interface type.

Here is a real code sample doing something very simple
– and in this case, the explicit boxing is mostly a lexical tax without representation.

  

      //
original language requires explicit __box operation

int my1DIntArray __gc[]
= { 1, 2, 3, 4, 5 };

      Object*
myObjArray __gc[] = {

__box
(26)

,

__box
(27)

,

__box
(28)

,

__box
(29)

,

__box
(30)

};

      Console::WriteLine(
"{0}\t{1}\t{2}",

__box
(0)

,

              __box(my1DIntArray->GetLowerBound(0)),

              __box(my1DIntArray->GetUpperBound(0))
);

As you can see, there is a whole lot of boxing going
on. Under T2, value type boxing is implicit [note that all T1 to T2 translations are
output of the mcfront tool]:

      //
revised language makes boxing implicit

array<int>^
my1DIntArray = {1,2,3,4,5};

array<Object^>^ myObjArray
= {26,27,28,29,30};

      Console::WriteLine(
"{0}\t{1}\t{2}", 0,

   my1DIntArray->GetLowerBound(
0 ),

   my1DIntArray->GetUpperBound(
0 ) );

 

Boxing is a peculiarity of the .NET unified type
system. Value types directly contain their state, while reference types are an implicit
duple: the named entity is a handle to an unnamed object allocated on the managed
heap. Any initialization or assignment of a value type to an Object, for example,
requires that the value type be placed within the managed heap this is where the image
of boxing it arises first by allocating the associated memory, then by copying the
value type s state, and then returning the address of this anonymous Value/Reference
hybrid. Thus, when one writes in C#

object o = 0; //
C# implicit boxing

there is a great deal more going on than is made
apparent by the simplicity of the code. The design of C# hides the complexity not
only of what operations are taking place under the hood, but also of the abstraction
of boxing itself. T1, on the other hand, concerned that this would lead to a false
sense of efficiency, puts it in the user s face by requiring an explicit instruction,

Object
*o = __box( 0 ); //
T1 explicit boxing

as if in this case one had any choice, or that it
particularly matters when one is invoking Console::WriteLine. In my opinion, forcing
the user to make an explicit request in these cases in at best the equivalent of one
s mother repeatedly demanding as one is trying to leave the house, now you will be
careful, won t you? Or, if you like, the child in the back seat asking, five minutes
out from the house, are we there yet? In both cases, we are not questioning the sincerity
behind the intent. And that is why boxing is implicit under T2:

Object ^o = 0; //
T2 implicit boxing

There are side-effects to implicit boxing, of course.
One of which being that the above initialization is not setting the object to null,
but to address a boxed instance of the integer value zero. This requires the introduction
of some entity that can represent a tracking handle to no object. Everyone s original
choice, of course, is null, and lucky for C# they could start from scratch and introduce
just such a keyword. Adding a paradigm to an existing language presents a few more
constraints think of the somewhat analogous problem of turning fins into legs, or
introducing lungs, as marine life moved onto land. In any case, my original choice
was the refnull, which one can t champion
with any real enthusiasm, and that has evolved over a year and a half into nullptr,
as in:

Object ^o = nullptr; //
T2 initialize tracking handle to refer to no object

[A T1 to T2 translation Head s Up] As I mentioned
in an earlier post, this presents something of a bother for those moving their code
from T1 to T2 since all comparisons and assignment/initialization of 0 change semantics
because of implicit boxing. mcfront, the translation tool, attempts to automagically
do the right thing, but certain cases such as calling an overloaded method requires
a great deal of type analysis that goes beyond the original scope of the tool which
consists of a parse engine, abstract syntax tree hierarchy (called an MCTree), and
a tree-walker (called an Ent) to generate the T2 source-level code. It s just a question
of having sufficient time to add the necessary type checking semantics, which are
not an aspect of the parse engine component. If you are doing the transition by hand,
it is something you need to watch out for. [End
of Head s Up]

Let me conclude by putting this is the context of
our two earlier metaphors of (1) Kansas and Oz as representing native and managed
Object Model behaviors, and (2) the two-faced Janus image as representing the design
face of C++/CLI.

1.
T1 did not provide implicit boxing. Why? Simply put,
we were thinking

Kansas  
  
, not Oz. We were looking through the wrong Janus pair of eyes. And this resulted  
in an inelegance and sense of complexity for our users. T2 addresses this imbalance.  
  
  
  1. On the other hand, T1 did provide direct access of
    the boxed value on the managed heap, since the alternative is not acceptable performance-wise.
    The lesson here is that without some

    Kansas

    thinking -- without using that set of Janus eyes -- system programming is not practicable.

For example, if one were to write a simple word-counting
program that represents a map where the word is represented as a string key and the
count as an integral value, then each increment of the count requires a downcast unboxing
of the existing value and subsequent reboxing of the new value into a new heap object.
Languages that have no performance characteristics are quick to ridicule performance
concerns and usually stoop to the mockery of saying, it hardly matters how fast a
program is if it simply returns an incorrect value faster, suggesting that concerns
with performance lead to bad programs and, even worse, bad programmers. Implicitly,
these people are condemning C and C++, and using that condemnation to promote the
sales of their languages. But here is an example of where performance and correctness
are not adversaries, but rather partners in providing a street savvy system's
programming language for .NET.

disclaimer: This posting is
provided "AS IS" with no warranties, and confers no rights. The opinions expressed
are those of the author.