Decoding standard output and standard error when redirecting to a GUI application

The Process class allows you to redirect the three streams basic I/O streams of a process: standard input, standard output and standard error. This is particularly useful for programmatically running console applications and capturing their output. And everything will work just fine as long as only 7-bit ASCII characters are involved. However, since even today console applications use code pages instead of Unicode any output from console applications needs to be re-encoded in order to show up correctly in a .NET String though this is not apparent as long as there are no extended characters in the output streams (adding null characters in the right places will turn them into UTF-16 encoded strings).

While I haven't done any extensive testing in this area I want to share the scenario in which I first hit his. I wrote a couple of tests that are executed by a test runner that is not a console application. The tests start new processes, which are console applications, capture their output by redirecting it and then compare it against a baseline. This worked fine for a long time. But all the time the tests only ran with en-US locale settings. They started failing when finally run on de-DE since umlauts would appear incorrectly in the captured output. Four hours of debugging taught me two things:

  1. In this particular scenario the StreamReader returned by  Process.StandardOutput is not initialized with the correct encoding.
  2. A GUI application can call Console.OutputEncoding but will get an incorrect answer.

The second lesson took me quite a bit of time to figure out. When Console.OutputEncoding is called in a console application it returns the output encoding for the attached console (which in most cases should be the default code page for consoles). However, in a GUI application the call will return the same encoding as Encoding.Default which is supposed to be used by non-Unicode GUI applications so it is useless for decoding output received from a console application. Unfortunately, those two properties are pretty much all the .NET Framework provides. So in order to pick the correct encoding for the scenario described above you'll have to ask the OS. In particular, GetCPInfoEx() will return the required information when CP_OEMCP is passed in. Doing this in C# looks a bit like this:

private const Int32 MAX_DEFAULTCHAR = 2;

private const Int32 MAX_LEADBYTES = 12;

private const Int32 MAX_PATH = 260;

private const UInt32 CP_OEMCP = 1;

public static Encoding GetDefaultOemCodePageEncoding()

{

    CPINFOEX cpInfoEx;

    if (GetCPInfoEx(CP_OEMCP, 0, out cpInfoEx) == 0)

        throw new InvalidOperationException(String.Format(CultureInfo.CurrentCulture,

                                                          "GetCPInfoEx() failed with error code {0}",

                                                          Marshal.GetLastWin32Error()));

    return Encoding.GetEncoding((int)cpInfoEx.CodePage);

}

[DllImport("Kernel32.dll", EntryPoint = "GetCPInfoExW", SetLastError = true)]

private static extern Int32 GetCPInfoEx(UInt32 CodePage, UInt32 dwFlags, out CPINFOEX lpCPInfoEx);

[StructLayout(LayoutKind.Sequential)]

private unsafe struct CPINFOEX

{

    internal      UInt32 MaxCharSize;

    internal fixed Byte   DefaultChar[MAX_DEFAULTCHAR];

    internal fixed Byte   LeadByte[MAX_LEADBYTES];

    internal       Char   UnicodeDefaultChar;

    internal       UInt32 CodePage;

    internal fixed Char   CodePageName[MAX_PATH];

}


This posting is provided "AS IS" with no warranties, and confers no rights.