What happens if I mutate a string in a p/invoke?


When it comes time to p/invoke to a Win32 function that writes to a string buffer, everybody uses a StringBuilder class to receive the string.

But could we just use a string? I mean, we can still allocate a buffer for the string and then ask Win32 to fill the buffer.

// Code in italics is wrong
[DllImport("user32.dll", CharSet = CharSet.Unicode)]
extern public static int GetKeyboardLayoutName(string buffer);

var buffer = new String('\0', 9);
GetKeyboardLayoutName(buffer);

I mean, sure C# strings are immutable, but that just means that you can't mutate them from within the C# language. The runtime will allocate some memory for the string, and that memory will be writable in practice, so the Get­Keyboard­Layout­Name function will be able to write to it, and bingo, the results are in the string! What could possibly go wrong?

What could possibly go wrong is that you're violating the rules of the language, namely that strings are immutable.

Passing a string via platform invoke means that the runtime will pass a null terminated C-style string that it expects to be read from. If the native function ends up writing to it, then what happens next is unpredictable.

For example, the platform invoke code is not required to pass a pointer to the internal string buffer. It might copy the string contents to a temporary buffer and pass a pointer to that temporary buffer. If the native function modifies that buffer, the runtime won't try to copy the results back to the original string buffer, because the runtime doesn't expect the native function to modify the buffer at all.

In fact, there is a case where this temporary buffer is guaranteed to exist: when the function being called takes an ANSI string. Because the raw internal string buffer is in the wrong format, namely Unicode (UTF-16LE), so the CLR needs to create a temporary ANSI version of the string.

Even if you manage to cajole the runtime into passing a pointer to the raw string buffer, the runtime doesn't expect the string to change, and if the native function doesn't fill the entire buffer, the runtime won't notice. You'll have a string with extra junk in it.

And the fact that you're mutating what is supposed to be immutable is going to cause its own problems:

using System;
using System.Runtime.InteropServices;
using System.Collections.Generic;

// Code in italics is wrong
class Program
{
  [DllImport("user32.dll", CharSet = CharSet.Unicode)]
  extern public static int GetKeyboardLayoutName(string buffer);

  public static void Main()
  {
    var hash = new Dictionary<string, int>();
    string buffer = new string('\0', 10);
    hash[buffer] = 2;
    GetKeyboardLayoutName(buffer);

    string buffer2 = new string('\0', 10);
    Console.WriteLine(hash[buffer2]);
  }
}

Strings are immutable, and therefore they can safely be used as keys in dictionaries. But in the above example, we are mutating the string that is being used as a key, which messes up the dictionary. Not only did the item's key change, but nobody can find the new key because its hash code is different, so it's in the wrong bucket in the dictionary.

Basically, you created a dictionary that violates the dictionary invariants.

Another case where mutating a string violates the rules of C# can be found in the reference source for the String.CompareOrdinalHelper method. The method compares two characters at a time, and once it finds a difference, it looks to see which character of the pair is the one that caused the strings to be different. This assumes that strings are immutable.

But if you mutate the internal buffer from another thread, it's possible that the first loop finds a pair of characters which don't match, but when it goes to see which of the pair it is, the contents of the buffer changed, and now the characters match after all. Assertion failure. Function returns incorrect result.

If you are passing a string buffer that native code will write to, use a String­Builder. That's what it's for.

Comments (19)
  1. Mitosis1000 says:

    How about Microsoft add more APIs to the .NET UI frameworks? My WPF code is littered with p/invoke calls. Even for someone (as yours truly) who used to do C++/Win32 programming, it’s annoying at times. Yeah, the website is out there with the examples…it’s still annoying I can’t restore my WPF window without dropping down to this stuff.

    Yes, I’m whining, and on the wrong blog, to boot.

    Otherwise, great post, thanks.

  2. Sebazzz says:

    What about passing in an char array or a pointer to a stackalloced char array. Does this have advantages?

  3. _Nicholas says:

    > sure C# strings are immutable, but that just means that you can’t mutate them from within the C# language.

    As far as I know, you actually *can* if you’re willing to use `unsafe` (which may be obvious, since with unsafe most bets are off):

    string a = “foo”;
    string b = “foo”;

    Console.WriteLine(a); // foo
    Console.WriteLine(b); // foo
    Console.WriteLine(object.ReferenceEquals(a, b)); // true

    unsafe { fixed(char* c = a) { *c = ‘b’; } }

    Console.WriteLine(a); // boo
    Console.WriteLine(b); // boo
    Console.WriteLine(object.ReferenceEquals(a, b)); // true

    This shows we’re actually modifying the interned string buffer and not somehow creating a copy.

    1. _Nicholas says:

      Well that’s gross. I don’t know where my newlines went.

      1. poizan42 says:

        The blog software has a longstanding bug where the formatting is broken when viewing your own comments. It looks fine for everyone else – just try logging out or viewing it in an incognito window.

    2. Gee Law says:

      You coincidentally can mutate a System.String and get away with it. But ECMA-334 5th edition (the latest ECMA C#) explicitly undefines the behavior of such code. See the example at the end of section 28.7.

    3. Alex Cohn says:

      I have once used unsafe to pass some data from HTTP stream directly to a C# structure. This worked perfect in unit tests, and in integration tests. But, luckily, the server team had some load tests configured, before deploying the feature to production. The crashes happened unpredictibly, and there was no hint that copy of the network buffer was involved.

  4. poizan42 says:

    So I have been thinking about this before and have been wondering what the performance impacts would be of calling VirtualProtect before and after allocating a string to keep the page read-only most of the time (would ofc. also require that the GC be changed to keep strings together in their own pages).

    1. Alois says:

      @poizan42: That would be wasteful because you could place only one string into one page. If you would add a second string to that page you would need to unprotect it for a short amount of time which would defeat the purpose of VirtualProtect again. The costs of soft faulting pages into a process working set are not so small as I would like it. See https://aloiskraus.wordpress.com/2017/11/12/bringing-the-hardware-and-windows-to-its-limits/

      1. poizan42 says:

        Yes there would indeed be a small window for another thread to mutate strings in the page. But if the goal is to rat out bugs then that doesn’t matter so much.

        The real price would depend on how often strings are allocated. It could also just be used while debugging (there are already several places where the runtime has additional checks when debugging which aren’t cheap performance wise).

  5. Anonymous says:
    (The content was deleted per user request)
  6. fowl says:

    If you modify a string, you’re going to have a particularly bad time once string de-duplication is implemented.

  7. mrphlip says:

    Poking at the internals to modify allegedly-immutable objects is good traditional fun in any language…
    Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34)
    >>> import ctypes
    >>> class PyIntObject(ctypes.Structure):
    ... _fields_ = [("ob_refcnt", ctypes.c_long),
    ... ("ob_type", ctypes.c_void_p),
    ... ("ob_ival", ctypes.c_long)]
    ...
    >>> fourobj = PyIntObject.from_address(id(4))
    >>> fourobj.ob_ival
    4
    >>> fourobj.ob_ival = 5
    >>> 4
    5
    >>> 2 + 2
    5
    >>> 5 - 1
    5
    >>> (2 + 2) + 1
    6
    >>> 5 - 1 - 1 - 1
    5
    >>> 2 * 2 * 2
    10

    1. RP (MSFT) says:

      This just reinforces my belief that Python is the latest incarnation of FORTRAN.

  8. cheong00 says:

    Btw, shouldn’t the declaration be like this?
    extern public static int GetKeyboardLayoutName(out string buffer);

    I think you need out/ref to receive the content correctly.

    1. Isn’t that addressed via the comment:

      // Code in italics is wrong

      1. cheong00 says:

        It’s wrong in different sense.

        Since “string” is a type passed “by value” be default, the “buffer” would always be a string with 9 null characters.

        1. Gee Law says:

          Passing a string passes the reference (pointer) to the string by value. The native function receives a pointer to the beginning of the string. CLR thinks it wants a char const *, while the code actually treats it as char * and modify the content, which means modifying the string object. The reference variable buffer does not change — it still points to the original object, which is now modified.

          In C parlance, this means: void funny(char const *buf) { *(char*)buf = 'a'; } and char b[] = "hello"; char const *p = b; funny(p); std::puts(p);.

          1. cheong00 says:

            Humm… I think “reference” to C# string is just a key to interned string table. Since it’s immutable the CLR provides no way to change the string value it points to, and it’s not pointer as if it were in C/C++. That’s why we need to use StringBuilder if we just need to modify a single character in string.

Comments are closed.

Skip to main content