Finishing the Hat

Article
12/02/2003

ns = "urn:schemas-microsoft-com:office:office"
/>

Finishing the Hat (^)

Once it became clear that support for .NET within
C++ represented a distinct programming paradigm, it followed that the language needed
to be extended to provide both a first class coding experience for the user, and an
elegant design integration with the ISO C++ standard in order to respect the sensibility
of the larger C++ community and engage their commitment and assistance. It also followed
that the diminutive name of the original language design, The Managed Extensions for C++, had
to be replaced as well.

The flagship feature of .NET is
the reference type, and its integration within the existing C++ language represented
a proof of concept. What were the general criteria? We needed a way to represent
the .NET reference type that both set it apart and yet felt analogous to the existing
type system. This would allow people to recognize the general category of form as
familiar while also noting its unique features. The analogy is the introduction of
the reference type by Stroustrup in the original invention of C++. So the general
form becomes

Type TypeModToken Id [ = init ];

where TypeModToken would
be one of the recognized tokens of the language reused in a new context (again, similar
to the introduction of the reference).

This was surprisingly controversial
at first, and still remains a sore point with some users – which, recall, is the motivation
for this initial set of blog entries. The two most common initial responses I recall
are (a) I can handle that with a typedef, wink, wink, and (b) it’s really not so bad.
[The latter reminds me of my response to the use of the left and right shift operators
for input and output in the iostream library.]

The necessary behavioral characteristics
are that it exhibit object semantics when operators are applied to it, something the
original syntax was unable to support. I liked to call it a flexible reference, thinking
in terms of its differences with the existing C++ reference [yes, the double use of
the reference here – one referring to the .NET reference type and the other referring
to the “it’s not a pointer, wink, wink” native C++ type – is unfortunate, much like
the reuse of template in the Gang of Four Patterns book for one of my favorite design
strategies.]:

1. It
would have to be able to refer to no object. The native reference, of course, cannot
do that directly although people are always showing me a reference being initialized
to a reinterpret-cast of a 0. [The conventional way to have a reference refer to no-object
is to provide an explicit singleton representing by convention a null object which
often serves as a default argument to a function parameter.]

2. It
would not require an initial value, but could begin life as referring to no object.

3. It
would be able to be reassigned to refer to another object.

4. The
assignment or initialization of one instance with another would exhibit shallow
copy by default.

As a number of folks made clear
to me, I was thinking of this puppy backwards. That is, I was referring to it by the
qualities that distinguished it from the native reference, not by the qualities that
distinguished it as a handle to a managed .NET reference type. We want to call the
type a handle rather than a pointer or reference because both of these terms carry
baggage from the native side. A handle is the preferred name because it is a pattern
of encapsulation – someone named John Carolan first introduced me to this design under
the lovely name of the Cheshire Cat since the substance of the object being manipulated
can disappear out from under you without your knowledge. In this case, the disappearing
act results from the potential relocation of reference types during a sweep of the
garbage collector. What happens is that this relocation is transparently tracked by
the runtime, and the handle is updated to correctly point to the new location. [This
is actually a complicated functionality to provide in a static language like C++.
Of course, on the other hand, it is expensive, and can be disrupted by an unconstrained
poking into the managed heap. This is why the pointer concept doesn’t strongly translate
into the .NET object model – at least at the user level.]

So, the new reference type in
the revised language design is referred to as a tracking handle, and exhibits the
four qualities listed above. In the following three tracking handles declarations
at global scope,

Object^ obj; // a declaration of a tracking handle to a .NET Object

Object^ poly = gcnew Foobar;

Object^ obj2 = poly;

obj is
a tracking handle of type Object that
refers to no Object,
and is by default set to null. In local scope, the equivalent declaration looks
as follows:

Object^
obj = nullptr; // local objects are not default initialized

poly is
a tracking handle of type Object that
is initialized to a FooBar object
allocated on the managed heap. Because the language now supports two dynamic heap
memories – the native heap, which is not garbage collected, and the managed heap,
which is – a separate new expression is used to allocate memory from each. In the
revised language design, a new keyword, gcnew, is added to allocate an object from
the .NET managed heap. For example, here is an old and new allocation of a Systems::Windows::Forms::Button object:

Button __gc *button1 = __gc
new Button(); // using the explicit form

Button^ button1 = gcnew
Button;

This is admittedly not a compelling
example for introducing a new keyword distinguishing where the memory allocation takes
place. In this example, it is clear from the context that the allocation is of a .NET
reference type and that it should take place on the managed heap. But the language
designers have a deeper vision of type unification between the native and managed
parts, and the introduction of gcnew facilitates
that. You’ll have to trust me on this for now. [Note, by the way, that the tracking
reference (^)
modifier is not required following the new expression.]

Finally, the initialization of obj2 tracking
handle with poly does
not result in a member-wise copy, as it would under the ISO C++ Object Model, but
results in a shallow copy so that both obj2 and poly refer
to the same FooBar object.

The need for an explicit entity
to indicate that a tracking reference refers to no object is a side-effect of the
change in type representation. The initialization or assignment of 0 no longer indicates
a null address. For example,

obj
= 0; // causes the implicit boxing of 0, not the assignment of obj to address no object

This raises a subtle issue with the porting of existing
Managed Extensions for C++ code into the revised language design. For example, consider
the following value class declaration:

// the original language
syntax

__value struct Holder

{

Holder(Continuation*
c, Sexpr* v)

{

cont =
c;

value =
v;

args =
0;

env =
0;

}

private:

Continuation*
cont;

Sexpr*
value;

Environment* env;

Sexpr* args __gc [];

};

Because both args and env are
managed reference types, their initialization to 0 in
the constructor cannot remain unchanged in the transition to the new syntax, but must
be changed to nullptr [note
that this translation is automated in a tool currently under development]:

// the revised language
syntax

value struct Holder

{

Holder( Continuation^
c, Sexpr^ v )

{

cont =
c;

value =
v;

args = nullptr;

env = nullptr;

}

private:

Continuation^
cont;

Sexpr^
value;

Environment^ env;

array<Sexpr^>^ args;

};

Similarly, tests against those members comparing
them to zero must also be changed to nullptr.
Here is the original syntax,

// the original language
syntax

Sexprst1 ns = "urn:schemas-microsoft-com:office:smarttags" />*

Loop

(Sexpr* input)

{

value =
0;

Holder holder = Interpret(this,
input, env);

while (holder.cont !=
0)

{

if (holder.env !=
0)

{

holder =
Interpret(holder.cont, holder.value, holder.env);

}

else if
(holder.args != 0)

{

holder = holder.value->closure()->apply(holder.cont, holder.args);

}

return value;

}

And here is the translation into the new syntax,
again generated automatically by a translation tool under development within our group:

//
the new revised syntax

Sexpr^

Loop

( Sexpr^ input )

{

value = nullptr;

Holder holder = Interpret( this,
input, env );

while ( holder.cont != nullptr )

{

if ( holder.env != nullptr )

{

holder =
Interpret( holder.cont, holder.value, holder.env );

}

else

if ( holder.args != nullptr )

{

holder = holder.value->closure()->apply( holder.cont, holder.args );

}

return value;

}

So,
the final item I wish to mention about the new tracking reference syntax is the member
selection operator. To me, it seemed like a no-brainer to use the object syntax (.).
Others felt the pointer syntax (->)
was equally obvious, and we argued our position from different facets of a tracking
reference’s usage:

// the pointer no-brainer

T^ p = gcnew T;

// the object no-brainer

T^ c = a + b;

So,
as with light in physics, a tracking reference behaves in certain program contexts
like an object and in other situations like a pointer. The member selection operator
that is used is that of the arrow, as in the original language design.

In
the next series of entries, I will walk through the changes in the language design,
contrasting the original and revised language support for the various .NET features.

disclaimer :
This posting is provided "AS IS" with no warranties, and confers no rights.

Finishing the Hat

Additional resources