A Question of Interopt

A reader asks the following question,

Sender: Richard Hsu

Stan,

Would you suggest, for those of us who are yet to begin learning MC++, to wait till C++/CLI ships, because we'll not only be learning what will be obsolete syntax, but also a more difficult one.

So, would you recommend, till the release of C++/CLI compiler, it may be a better idea to use Interop in C# than go for MC++ as a temporary measure.

Let me preface these remarks by saying that this response represents my personal opinion and in no way represents official Microsoft policy.

I could not in good conscience recommend learning the Managed Extensions to C++ as a first introduction to the CLI object model and dynamic programming in general. The best books right now of course all use C# as the illustrative language – we sort of gave them no choice, imo.

However, that said, there is NO viable interopt story for native C++ other than the managed extensions – at least until we ship the C++/CLI compiler. My understanding is that the first beta is due to be released Real Soon Now (RSN).

So my answer to you is admittedly a fudge: I would go with the new language syntax and the intermediate beta releases if that is a viable solution for your environment.

Here is a draft of something I am writing that might be helpful (or not) …

For the third edition of my C++ Primer, I created a relatively large text query application that heavily exercises the STL container classes for parsing a text file and setting up an internal representation. For example, the include file looks as follows,

#include <algorithm>

#include <string>

#include <vector>

#include <utility>

#include <map>

#include <set>

 

using namespace std;

The data representation looks as follows:

typedef pair<short,short> location;

typedef vector<location>  loc;

typedef vector<string>    text;

typedef pair<text*,loc*>  text_loc;

class TextQuery {

public:

      // …

 

private:

      vector<string>   *lines_of_text;

      text_loc          *text_locations;

      map<string,loc*>  *word_map;

      Query             *query;

      static string     filt_elems;

      vector<int>       line_cnt;

};

Query is the abstract base of a class hierarchy supporting a query language. For example, here is how a query session might run:

Enter a query-please separate each item by a space.

Terminate query (or session) with a dot( . ).

 

==> fiery && ( bird || shyly )

 

        fiery ( 1 ) lines match

        bird ( 1 ) lines match

        shyly ( 1 ) lines match

        ( bird || shyly )  ( 2 ) lines match

        fiery &&  ( bird || shyly )  ( 1 ) lines match

 

Requested query: fiery &&  ( bird || shyly )

 

( 3 ) like a fiery bird in flight. A beautiful fiery bird, he tells her,

The native invocation of the text query system looks like this:

int main()

{

TextQuery tq;

 

      tq.build_up_text();

      tq.query_text();

}

I want to expose the TextQuery interface to a managed application without having to touch the darn thing, let alone consider reimplementing it. After all, it works. Honestly, I'm not sure if I'm ready to go to managed code if it means giving up on my extensive investment in native code. 

The simplest strategy for exposing a native interface is to wrap a reference class around the native type, populating it with stub methods that invoke the associated native methods. Once performance data is collected, we may wish to cache or queue processing requests before crossing the boundary between native and managed or even selectively port critical portion of the original application. The key thing is to get it up and running without globs of expended time and headache. Here is the ten minute text query wrap,

#include "TextQuery.h"

public ref class TextQueryCLI

{

    // pointer to the native type …

    TextQuery *pquery;

     

public:

    TextQueryNet() : pquery( new TextQuery()){}

   ~TextQueryNet(){ delete pquery; }

 

    void query_text()    { pquery->query_text();   }

    void build_up_text() { pquery->build_up_text();}

 

    // …

};

Under the current language definition, the only way to declare a native type as a member of a CLI class is to declare it as a pointer. We allocate it on the native heap through the new expression within the wrapper object's constructor. We delete it within the destructor. build_up_text() and query_text() function as stubs dispatching the call to the native TextQuery object. Here is our revised main() function

#include "TextQueryCLI.h"

int main(void)

{

    TextQueryCLI tqc;

 

      Tqc.build_up_text();

      Tqc.query_text();

 

      // destructor automatically invoked here …

}

We would not want to embed an actual native class object within a reference class – that is, to have it reside on the CLI heap. The problem is that the garbage collector is likely to relocate the object during heap compaction. Relocation does not provide a facility to invoke an associated copy constructor nor destructor.

A likely future extension to C++/CLI will relieve the programmer from the manual management of the native class. While it would still reside on the native heap, we would declare and use it as if it were an object. The compiler would transparently manage its allocation and deletion.

Wrapping a class hierarchy is slightly more subtle. For example, consider the following native hierarchy,

class Query{ … };

class BinaryQuery : public Query { … };

class AndQuery : public BinaryQuery{ … };

We'd like to introduce a reference class hierarchy that provides a wrapper to the native class hierarchy. Our first thought might be to include a pointer to the associated native type within each derived reference class. For example,

public ref class AndQueryCLI : public BinaryQueryCLI

{

    // not the recommended strategy

    AndQuery *paq;

 

    // …

};

but this quickly fouls up. When our users do the following,

{

    // oops: we lose our access to paq …

    QueryCLI ^q = gcnew AndQueryCLI;

};

we lose direct access to the native AndQuery pointer.

A second issue has to do with the constructors associated with our wrapping classes. In a class hierarchy, a constructor invocation either represents the whole object being created, or a base class sub-object being initialized prior to the execution of the body of the whole object's constructor. When we write,

QueryCLI ^q = gcnew AndQueryCLI;

AndQueryCLI represents the whole object, and the invocation, in turn, of the QueryCLI and BinaryQueryCLI constructors represent the initialization of its two base class sub-objects.

If we place the native object pointer in the most derived class, its constructor becomes responsible for the initialization of the pointer – remember this involves native heap allocation that requires deletion within the destructor. This makes subsequent derivation extremely problematic – that is, when its invocation represents a sub-object rather than whole object initialization.

 The better strategy is to store the native object pointer within the abstract base class – in our example, that would be QueryCLI, which becomes responsible for its initialization. For example,

public ref class QueryCLI abstract

{

    Query *pquery;

 

public:

    QueryCLI( Query *pq ) : pquery( pq ){}

    // …

};

 

// …

 

public ref class AndQueryCLI : public BinaryQueryCLI {

public:

    AndQueryCLI( AndQuery *paq )

               : BinaryQueryCLI( paq )

    { /* perform class specific initialization */ }

 

    // …

};

Notice the introduction of a context sensitive abstract keyword. This is considerably easier to understand as marking a class as abstract than the presence of a pure virtual function within the class definition.

Our abstract QueryCLI class will need to provide the interface for the entire class hierarchy it anchors. Some will argue that this hybrid implementation in which the abstract root of our hierarchy mixes both interface and implementation is bad design.

Because the wrapping pattern requires allocation of the native object on the unmanaged heap, a value class is unsuitable to contain the wrapped object. This is because the CLI object model does not support the association of a destructor with a value class. This leaves us with no way to automate the deletion of the allocated memory.