Pointer to String chars - Everett style

Garrett asked:

If the source text is in a CLR String, and we want to pass(even read-only) to unmanaged code, it appears that there is no way to get a pointer to the String's buffer directly. We have to use the marshalling stuff to get it there, which in itself makes a copy.

Given that one of managed C++ and CLI/C++ 's goals (imnsho) is to facilitate leveraging existing native c++ code, has any thought been given to this?

Can I get a native pointer to the data in a CLR String? The short answer is yes, so long as you don't mind a wchar_t* - which is native analog of the actual backing store type for a CLR String (the CLR type System::Char). Even in Everett, we supported doing this. You have to use a special function in order to get at it, located in the header file , which shipped with Everett. This header file includes a function, PtrToStringChars, which takes a String* and returns a wchar_t __gc*. You can use the returned pointer - called an “interior pointer” - to munge with the string data in a fairly intuitive “native” way, as in this example code:

#using <mscorlib.dll>
#include <vcclr.h>
using namespace System;

int main(){
String *s = S"abcdefg";
wchar_t __gc* pc = PtrToStringChars(s);
for(int i=0; iLength; i++){
*(pc+i)+=1; //increment each character in the string
}
Console::WriteLine(pc); //writes "bcdefgh"
}

Yeah, but I wanted to use a native function. I'm getting there. Now, you can't convert from a __gc* to a __nogc* (“native pointer“), but you can convert from a __gc* to another type - __pin* - which has a conversion to a __nogc*:

#using

<mscorlib.dll>
#include <vcclr.h>
using namespace System;

int unmanagedStrLenFunction(wchar_t *c){ //counts the length of c
int count=0;
while(*c){
count++;
c++; //heh
}
return count;
}

int

main(){
String *s = S"abcdefg";
wchar_t __gc* pc = PtrToStringChars(s);
wchar_t __pin* ppc = pc;
int x = unmanagedStrLenFunction(ppc);
Console::WriteLine(__box(x)); //writes "7"
}

I could have turned the result of PtrToStringChars directly into a wchar_t __pin* directly, but I wanted to make it absolutely clear.

Wow! Pin pointers are cool! I'm going to use them everywhere! Whoa there, trigger. There are a few things to keep in mind about pin pointers:

  1. They can be extremely costly. The pin pointer works by literally pinning the enclosing type down, so the GC collector can't move it around when its doing collections. Do this too often, or keep the pin pointer around for a relatively long time, and you're seriously hurting the performance of the garbage collector - not a good idea.
  2. They can't be used everywhere. By design, because of the costliness and lifetime problems involved with pin pointers, they can't be: members of a type, function return types, function parameters, or temporary variables.
  3. Pin pointers only pin objects for their lifetime. This leaves open the possibility of GC holes. That is, you can get a native pointer to the GC, release the pin pointer, and then leave yourself a huge GC hole. For example:

__gc class A{
public:
int i;
};

int* gchole(A* a){
int __pin* p = &(a->i);
return p;
}

What's so wrong with that code? On the surface, it looks pretty benign. But remember that the object passed in (a) is only pinned for the lifetime of the pin pointer p. So, when the function returns, you have a native pointer into the GC heap, which would be safe, except p has been destroyed. So, instead, you have a GC hole. The pointer returned from the function gchole is only going to be valid until the next garbage collection - and who knows when that will happen. In short, don't do this, if you want to avoid unexplainable, untraceable, unreproduceable application crashes.

Back to the original question, what about regular char*'s? No chance, not without incurring a copy cost (either by using API functions that turn wchar_t*'s into char*'s, or by marshalling).

In a future article, I'll describe the new syntax versions of the pinning and interior pointer, and some of the (mostly minor) differences.