Converting a byte[] to a System.String


For some reason, this question gets asked a lot. How do I convert a byte[] to a System.String? (Yes, this is a CLR question. Sorry.)

You can use String System.Text.UnicodeEncoding.GetString() which takes a byte[] array and produces a string.

Note that this is not the same as just blindly copying the bytes from the byte[] array into a hunk of memory and calling it a string. The GetString() method must validate the bytes and forbid invalid surrogates, for example.

You might be tempted to create a string and just mash the bytes into it, but that violates string immutability and can lead to subtle problems.

Comments (7)
  1. Ben Hutchings says:

    On a related question, how do those of us not using .NET achieve streamable character conversion – that is, conversion where the converter can perform a partial conversion, indicate errors such as "the last n bytes of input begin but don’t complete a multibyte character" or "the output buffer is too small so only converted m bytes of input were converted", and then allow you to continue with another block of input data and/or output buffer? MLang appeared to offer this but so far as I can see it doesn’t, or at least the documentation doesn’t cover it. Yet IE is presumably doing it, and MLang is part of IE…

  2. Ben Hutchings says:

    (Apologies for the slightly incoherent rambling sentence above.)

  3. Ben Lowery says:

    Something else to mention is that you should match the System.Text.Encoding subclass to the contents of the byte[]. For example, passing a byte[] that contains text encoded using UTF-8 to UnicodeEncoding’s GetString method won’t decode the byte[] properly. For example:

    <pre>

    using System;

    using System.Collections;

    using System.Text;

    public class MyClass

    {

    public static void Main()

    {

    byte[] text = Encoding.UTF8.GetBytes("my string");

    string s = Encoding.Unicode.GetString(text);

    Console.WriteLine(s);

    s = Encoding.UTF8.GetString(text);

    Console.WriteLine(s);

    }

    }

    </pre>

  4. Clinton Pierce says:

    Unicode? We don’ need no stinkin’ Unicode! :)

    string s=System.Text.Encoding.ASCII.GetString(buffer, 0, buffer.Length);

  5. Jon Potter says:

    not actually a .NET blog?

  6. Mr. Ed says:

    Regarding the Abrams link:

    Why, oh why, does the string have a cast operator to a non-const C-string, if the string is immutable?

  7. Norman Diamond says:

    In VC++ 2005 beta 1, either the _T() macro doesn’t work, or there’s something funny about macros that are or used to be UNICODE and _UNICODE. I haven’t had time to investigate. When I had time to practice with VC++ 2005 beta 1, I just worked around it by changing _T("string") to L"string", forcing them to be wide strings, and wide strings are Unicode in Windows.

    But … this didn’t have to be done with all strings. Some of them I just left as "string", forcing them to be multibyte strings. Automatic conversions and boxing to type System::String^ correctly converted some of these ANSI strings to Unicode, only garbaging up some others. I haven’t had time to investigate if there’s a reason for this.

    (This didn’t seem to be the worst issue I found in VC++ 2005 beta 1, because the IDE was still operating and forms could still be edited after that. But if I didn’t have time to investigate if there’s a more serious underlying cause or not.)

Comments are closed.

Skip to main content