On the dangers of unsafe String handling…

Another interesting tidbit I learned  today from Brian Grunkemeyer, BCL developer …

It may be tempting to be ultra fast and use C# unsafe code support to just party over memory and write your own string like this:


string s = new string(‘\0’, buffer.Length);

fixed(byte* pBuffer = buffer)


            fixed(char* pString = s)


                        for(int i = 0; i < buffer.Length; i++)

                                    pString[i] = (char) pBuffer[i];





But you should be aware of the issues with doing something like this….  Here is Brian’s comments (slightly edited):

The reason you should avoid this kind of code is that we have some additional bits stored in a string instance that tell the runtime whether we can do some optimized sorting & comparison, or whether we have to load an NLS data table to get culture-correct weights for each character when comparing them.  You may make your string very quickly, but it might not sort right at all.  These bits are stored somewhere else in the string instance where you can’t manipulate them.  We might add or subtract bits like this in other versions of the Framework as well.  (Additionally, some strings may be stored in read-only pages in the future as well.)  Using unsafe code to write over a string is not always a good idea…

I would suggest looking at the Rotor source for more info if you really feel you have to go this way…

Comments (7)

  1. John Schroedl says:

    This code brings up one of my peeves: the inability to do data bkpts in the CLR. I realize its because the heap manager can move objects around, but why couldn’t I have one on pString[2] for example if I’ve marked it as fixed? Thoughts?

  2. Doug McClean says:

    I have a question that is somewhat related to this. What is the best practice when dealing with things like password strings? It used to be to clear them when done with them by using something like SecureZeroMemory, but now that strings are a) immutable and b) garbage collected at some unknown future time, is there anything we can do to minimize the risk of revealing that information? Perhaps using StringBuilder or some similar approach?

  3. S N says:


    This one was more useful to me. I was following the above mentioned pattern in the peformance critical code path. Though the number of places where I used this, is less than five, they were used in the critical algorithms.

    After doing to some research, I replaced the above pattern with the following one.

    In all those places, I was using a intermediate buffer (byte[]). The data will be read from the stream to byte[]. From the byte[], the newly created string would be initialized directly.

    The new pattern that I use now is mentioned below.

    If the new string’s length is less than the byte[] length, read the data from stream to byte[]. Then use new string(sbyte*, startIndex, length, Encoding) to create a new string.

    If the byte[] size is smaller than the string length,

    * Create a StringBuilder instance with capacity and maximum capacity equal to string length plus 1. The additional one element is to take care of terminating NULL character.

    * create a small (4K) char[].

    * Fill char[] in chunck.

    * Add char[] chunk to StringBuilder.

    * Finally use StringBuilder.ToString() to get the string.

    Since, StringBuilder returns the underlying string itself when the empty space within the string is lesser than its half the size, there will not be any intermediate string created by StringBuilder as part of this operation.

    This gives me the performance that I need without adding any bugs in my system.


    S N

  4. ‘String System.Text.UnicodeEncoding.GetString