The Type of a String Literal Revisited ...

In the course of these entries, I have twice addressed the issue of the type of a string literal under C++/CLI -- in particular when resolving an overloaded function call. The issue is illustrated in the following example,

public ref class R {

public:

  void foo( System::String^ ); // (1)

  void foo( std::string ); // (2)

  void foo( const char* ); // (3)

};

 

void bar( R^ r )

{

  // which one?

  r->foo( "Pooh" );

}

In the original Managed Extensions for C++, the invocation of foo() within bar() resolved to (3), exactly the same as it does under ISO-C++. That is,

void bar( R^ r )

{

  // under Managed Extensions to C++, resolved to

  // void foo( const char* );

  r->foo( "Pooh" );

}

To briefly review: In ISO-C++, the type of "Pooh" is const char[5]. There is no exact match of "Pooh" to any of the three instances of foo(). However, the trivial conversion of const char[5] to const char* represents a best match, and this is why (3) is invoked. There was no built-in notion of a string literal having any relationship to System::String.

And this was changed in the design of C++/CLI. Actually, it was changed twice, and that is the talking point of this entry – to explain why the initial change had to be further refined.

The overall effect of the change is to extend dual citizenship to a string literal compiled for the CLI. The initial change is described in my earlier entry entitled String Literals are now a Trivial Conversion to String. Here is a brief review of the issue.

The first question is, what is the exact type of "Pooh" within C++/CLI? One answer is, well, obviously, it is of type const char[5] – otherwise, it could not be compatible with ISO-C++ . We can't change that.

The initial solution, therefore, was to introduce a new trivial conversion, that of a string literal to a System::String, that is of equal precedence with the trivial conversion of a string literal to const char*. This provides a somewhat elegant symmetry, but in practice results in a flurry of ambiguous calls. For example, under this design, the invocation of foo() now fails,

void bar( R^ r )

{

  // under interim C++/CLI, flagged as ambiguous

  // the following two candidate functions are equally good …

  // void foo( System::String^ );

  // void foo( const char* );

  r->foo( "Pooh" );

}

To disambiguate the call, the user would have to provide an explicit cast,

void bar( R^ r )

{

  // ok: void foo( System::String^ );

  r->foo( safe_cast<String^>( "Pooh" ));

}

In practice, in nearly every case, the C++/CLI programmer wished to have the String instance invoked in preference to the C-style string instance. And so giving equal precedence to both conversions was both a step forward (in recognizing the special relationship of a string literal to System::String under the CLI) and two steps back (the presence of the const char* argument in effect neutralized that relationship (first step back) and required an explicit cast to resolve (second step back) ).

So, we had to fix that. That is, under the CLI, we want a string literal to more closely be a kind of System::String than const char*. The question was, how could that be achieved without breaking ISO-C++ compatibility? How might you resolve that?

The insight to resolve this is to realize that the dual citizenship of a string literal applies to its fundamental type, not to its set of trivial conversions. In effect, under C++/CLI, the underlying type of a string literal such as "Pooh" is both const char[5] (its native inheritance) and System::String (its managed underlying unified type). Under C++/CLI, the string literal is an exact match to System::String and the trivial conversion to const char* is not considered. That is, under the revised C++/CLI language specification, the ambiguity has been resolved in favor of System::String,

void bar( R^ r )

{

  // ok: under current C++/CLI,

  // void foo( System::String^ );

 

  r->foo( "Pooh" );

}

This reflects a fundamental difference between ISO-C++ and C++/CLI in their type systems. In ISO-C++ , types are independent except when explicitly part of the same class inheritance hierarchy. Thus, there is no implicit type relationship between a string literal and the std::string class type, even though they share a common abstraction domain.

C++/CLI, on the other hand, supports a unified type system. Every type, including literal values, is implicitly a kind of Object. This is why we can call methods through a literal value or an object of the built-in types. The value 5 is of type Int32. The string literal is of type String. It just doesn't work to treat a string literal as either more like or equal to a C-Style string.

The integrated conversion hierarchy allows a working ISO-C++ program to continue to exhibit the same behavior when compiled for the CLI, while a new C++/CLI program exercising the CLI types reflects the new type priority of the string literal.

Most readers and programmers have little patience with this level of detail, and often point to discussions of C++ type conversions as evidence of its complexity. However, I don't think that is quite fair. The existence of these rules is necessary if one is to reach an intuitive language behavior and guarantee a uniform behavior across implementations. It is because the language generally behaves in a type-intuitive way that allows programmers in fact to ignore these details.

While the length of this discussion might seem disproportionate to the topic's importance, it strikes me as a canonical example of the extent to which we have had to work to integrate the CLI type system into the ISO-C++ semantic framework. This should also suggests certain good practices when a native class is being recast to a CLI class type. It is better, for example, to refashion the set of member functions accepting string literals rather than simply stirring in an additional String instance to our stew of overloaded functions, seasoning it to local taste.