Pointer to String chars – Everett style


Garrett asked:



If the source text is in a CLR String, and we want to pass(even read-only) to unmanaged code, it appears that there is no way to get a pointer to the String’s buffer directly. We have to use the marshalling stuff to get it there, which in itself makes a copy.


Given that one of managed C++ and CLI/C++ ‘s goals (imnsho) is to facilitate leveraging existing native c++ code, has any thought been given to this?


Can I get a native pointer to the data in a CLR String?  The short answer is yes, so long as you don’t mind a wchar_t* – which is native analog of the actual backing store type for a CLR String (the CLR type System::Char).   Even in Everett, we supported doing this.  You have to use a special function in order to get at it, located in the header file , which shipped with Everett.  This header file includes a function, PtrToStringChars, which takes a String* and returns a wchar_t __gc*.  You can use the returned pointer – called an “interior pointer” – to munge with the string data in a fairly intuitive “native” way, as in this example code:



#using <mscorlib.dll>
#include <vcclr.h>
using namespace System;




int main(){ 
  String *s = S”abcdefg”; 
  wchar_t __gc* pc = PtrToStringChars(s); 
  for(int i=0; iLength; i++){ 
    *(pc+i)+=1; //increment each character in the string 
 
  Console::WriteLine(pc);  //writes “bcdefgh”
}


Yeah, but I wanted to use a native function.  I’m getting there.  Now, you can’t convert from a __gc* to a __nogc* (“native pointer“), but you can convert from a __gc* to another type – __pin* – which has a conversion to a __nogc*:



#using <mscorlib.dll>
#include
<vcclr.h>
using namespace
System;

int unmanagedStrLenFunction(wchar_t *c){
//counts the length of c
  int
count=0;
  while
(*c){
    count++;
    c++;
//heh
  }
  return
count;
}


int main(){
  String *s = S”abcdefg”;
  wchar_t __gc
* pc = PtrToStringChars(s);
  wchar_t __pin
* ppc = pc;
  int
x = unmanagedStrLenFunction(ppc);
  Console::WriteLine(
__box(x));
//writes “7”
}


I could have turned the result of PtrToStringChars directly into a wchar_t __pin* directly, but I wanted to make it absolutely clear.


Wow!  Pin pointers are cool!  I’m going to use them everywhere!  Whoa there, trigger. There are a few things to keep in mind about pin pointers:




  1. They can be extremely costly.  The pin pointer works by literally pinning the enclosing type down, so the GC collector can’t move it around when its doing collections.  Do this too often, or keep the pin pointer around for a relatively long time, and you’re seriously hurting the performance of the garbage collector – not a good idea.


  2. They can’t be used everywhere.  By design, because of the costliness and lifetime problems involved with pin pointers, they can’t be: members of a type, function return types, function parameters, or temporary variables.


  3. Pin pointers only pin objects for their lifetime.  This leaves open the possibility of GC holes.  That is, you can get a native pointer to the GC, release the pin pointer, and then leave yourself a huge GC hole.  For example:



__gc class A{
public
  int i;
};

int* gchole(A* a){ 
  int __pin* p = &(a->i); 
  return p;
}


What’s so wrong with that code?  On the surface, it looks pretty benign.  But remember that the object passed in (a) is only pinned for the lifetime of the pin pointer p.  So, when the function returns, you have a native pointer into the GC heap, which would be safe, except p has been destroyed.  So, instead, you have a GC hole.  The pointer returned from the function gchole is only going to be valid until the next garbage collection – and who knows when that will happen.  In short, don’t do this, if you want to avoid unexplainable, untraceable, unreproduceable application crashes.


Back to the original question, what about regular char*‘s?  No chance, not without incurring a copy cost (either by using API functions that turn wchar_t*‘s into char*‘s, or by marshalling).


In a future article, I’ll describe the new syntax versions of the pinning and interior pointer, and some of the (mostly minor) differences.

Comments (9)

  1. Garrett Serack says:

    Sweet Mother Of all that is Good and Holy.

    #include <vcclr.h>

    I’m a little unclear how that escaped me all this time–Probably ’cause I neglected to read any documentation when Everett came out, I just upgraded and moved on. I don’t mind wchar_t* ‘s : It’s what I wanted all along.

    sigh — Must be a sign to find a complete management job 🙂

    Yeah, pin pointers rock. 🙂 — And you sure are correct about the need to use them as little and quickly as possible.

    I’ve been pretty strict about their usage, just to be safe. I typically have built a few macros/inline functions to handle pinning, to guarantee they have limited lifespan.

    Now, I just looked at the vcclr.h file.

    inline const System::Char * PtrToStringChars(const System::String s) {

    const System::Byte *bp = reinterpret_cast<const System::Byte *>(s);

    if( bp != 0 ) {

    unsigned offset = System::Runtime::CompilerServices::RuntimeHelpers::OffsetToStringData;

    bp += offset;

    }

    return reinterpret_cast<const System::Char
    >(bp);

    }

    I considered doing something like that by hand, as a method of finding the beginning of the string, but I discarded the thought, as I wasn’t sure if the offset into the class would be constant, and didn’t want to risk buggering up in future CLR versions. Sure nice of there to be a constant declared for that 🙂

    G

  2. arich says:

    Yeah, I’m actually not sure if the offset into the class is a constant or not, as the function uses the OffsetToStringData method of the runtime.

    Upon ildasming mscorlib.dll (which everyone should do from time to time, just to see what’s in it), it looks as though the offset is 12 – but this probably varies by architecture.

  3. Garrett Serack says:

    Speaking of ildasm — I was thinking that it’d be nice if some of these type utilities were built into Visual Studio. I’m trying to do mentoring on the finer points of software devleopment with some folks and I find that they are less likely to use tools that are external to the development environment. (sigh–kids these days 😉

    That, and developer MSIL support in VS.NET would be nice too.

    Heck, while I’m at it, can I get a pony?

    Garrett

  4. Hot on the heels of my article on interior pointers, comes a much more insightful one by Stan Lippman…