How to consume web response with non-UTF8 charset on Windows Phone 8

The web request API's in Windows Phone 8 SDK only support processing text from a web response in UTF8 or UTF16 encoding.  This can be a problem if you need to work with older web servers which respond with text content encoded using other character sets. 

Fortunately, the HttpWebRequest and HttpClient classes provide methods for accessing the response payload as raw binary stream which allows your application to decode the text, into a Unicode string, using custom code.

  In this blog I will present two examples of custom methods for converting a raw binary array into a Unicode string.  I will then show an example which uses HttpWebRequest to read the raw binary of a response and then uses the functions from first to examples.

Converting to string in managed code using a lookup table.

The first example I will show is using a simple lookup table which tells us the Unicode character corresponding to a given byte value from a code page.  In this example I chose to use code page: windows-1251.

Using the data from the code page definition I created a table which contains 256 values, one for each possible byte value:

  1: // this is the static lookup table I am using to convert from byte values 
  2: // in the windows-1251 code page to Unicode character values.
  3: short[] Windows1251LookupTable = 
  4:     { 0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008, 0x0009, 0x000A, 0x000B, 0x000C, 0x000D, 0x000E, 0x000F, 
  5:         0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0015, 0x0016, 0x0017, 0x0018, 0x0019, 0x001A, 0x001B, 0x001C, 0x001D, 0x001E, 0x001F, 
  6:         0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027, 0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E, 0x002F, 
  7:         0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003A, 0x003B, 0x003C, 0x003D, 0x003E, 0x003F, 
  8:         0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047, 0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F, 
  9:         0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057, 0x0058, 0x0059, 0x005A, 0x005B, 0x005C, 0x005D, 0x005E, 0x005F, 
  10:         0x0060, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067, 0x0068, 0x0069, 0x006A, 0x006B, 0x006C, 0x006D, 0x006E, 0x006F, 
  11:         0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077, 0x0078, 0x0079, 0x007A, 0x007B, 0x007C, 0x007D, 0x007E, 0x007F, 
  12:         0x0402, 0x0403, 0x201A, 0x0453, 0x201E, 0x2026, 0x2020, 0x2021, 0x20AC, 0x2030, 0x0409, 0x2039, 0x040A, 0x040C, 0x040B, 0x040F, 
  13:         0x0452, 0x2018, 0x2019, 0x201C, 0x201D, 0x2022, 0x2013, 0x2014, 0x0098, 0x2122, 0x0459, 0x203A, 0x045A, 0x045C, 0x045B, 0x045F, 
  14:         0x00A0, 0x040E, 0x045E, 0x0408, 0x00A4, 0x0490, 0x00A6, 0x00A7, 0x0401, 0x00A9, 0x0404, 0x00AB, 0x00AC, 0x00AD, 0x00AE, 0x0407, 
  15:         0x00B0, 0x00B1, 0x0406, 0x0456, 0x0491, 0x00B5, 0x00B6, 0x00B7, 0x0451, 0x2116, 0x0454, 0x00BB, 0x0458, 0x0405, 0x0455, 0x0457, 
  16:         0x0410, 0x0411, 0x0412, 0x0413, 0x0414, 0x0415, 0x0416, 0x0417, 0x0418, 0x0419, 0x041A, 0x041B, 0x041C, 0x041D, 0x041E, 0x041F, 
  17:         0x0420, 0x0421, 0x0422, 0x0423, 0x0424, 0x0425, 0x0426, 0x0427, 0x0428, 0x0429, 0x042A, 0x042B, 0x042C, 0x042D, 0x042E, 0x042F, 
  18:         0x0430, 0x0431, 0x0432, 0x0433, 0x0434, 0x0435, 0x0436, 0x0437, 0x0438, 0x0439, 0x043A, 0x043B, 0x043C, 0x043D, 0x043E, 0x043F, 
  19:         0x0440, 0x0441, 0x0442, 0x0443, 0x0444, 0x0445, 0x0446, 0x0447, 0x0448, 0x0449, 0x044A, 0x044B, 0x044C, 0x044D, 0x044E, 0x044F 
  20:     };

I then created a function: “StringFrom1251” which loops through the buffer building a Unicode string:

  1: private string StringFrom1251(byte[] Buffer)
  2: {
  3:     StringBuilder sb = new StringBuilder(Buffer.Length + 1);
  4:     foreach (byte b in Buffer)
  5:     {
  6:         sb.Append((char)Windows1251LookupTable[(int)b]);
  7:     }
  8:     return sb.ToString();
  9: }

This function takes a byte array as input and returns a Unicode string.  This is fairly straight-forward but requires a separate lookup table for every code page you want to support.

Converting to string using a native library and Win32 API

  In many cases, Windows Phone 8 may already support the conversion from a give code page to the Unicode string natively but in order to leverage this in a Windows Phone 8 managed application you need to write a Windows Phone Runtime Component which you can call from your managed code, which is what I will show next.

  The “Windows Phone Runtime Component” project template can be found in the “Windows Phone” projects group under the “Visual C++” language templates. 

  In the sample below I gave my project the name “NativeHelper” so the file names and namespace all refer to NativeHelper.

  After creating the project I first added an include reference to the header file “agile.h” which gives me all the definitions my code needs to perform the actions I require.

  Next I edited NativeHelper.h to define my class:”StringConverter” and one method: “GetUnicodeString”.  GetUnicodeString needs to return a string, obviously, it also needs to know what code page to use for the conversion, and it needs the array of bytes which need to be converted.  The result look like this:

  1: #pragma once
  2: #include <agile.h>
  3:  
  4: namespace NativeHelper
  5: {
  6:     static public ref class StringConverter sealed
  7:     {
  8:     public:
  9:         static Platform::String^ GetUnicodeString(UINT CodePage, const Platform::Array<byte, 1>^ input);
  10:     };
  11: }

Note: I declared both as “static” because the class does not need to maintain any instance data and making the function static allows me to use the method without creating an instance of the class.

After saving the changes to “NativeHelper.h”, I open NativeHelper.cpp to implement my GetUnicodeString method. 

  The GetUnicodeString method will use the Win32 API: MultiByteToWideChar… but you need to call this API twice, the first time you call with a null output buffer in order to get the buffer size required to hold the result. 

  You then allocate the buffer and call a second time to get the actual result.  Finally, you need to take that result and use it to initialize a “Platform::String type which can be passed back to the managed code. The result looks like this:

  1: #include "pch.h"
  2: #include "NativeHelper.h"
  3:  
  4: using namespace NativeHelper;
  5: using namespace Platform;
  6:  
  7: Platform::String^ StringConverter::GetUnicodeString(UINT CodePage, const Platform::Array<byte, 1>^ input)
  8: {
  9:     Platform::String^ szOutput;
  10:     WCHAR* output = NULL;
  11:     int cchRequiredSize = 0;
  12:     unsigned int cchActualSize = 0;
  13:  
  14:     cchRequiredSize = MultiByteToWideChar(CodePage, 0, (char*)input->Data, input->Length, output, cchRequiredSize); // determine required buffer size
  15:  
  16:     output = (WCHAR*)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, (cchRequiredSize+1)*sizeof(wchar_t)); // fix: add 1 to required size and zero memory on alloc
  17:     cchActualSize = MultiByteToWideChar(CodePage, 0, (char*)input->Data, input->Length, output, cchRequiredSize);
  18:     
  19:     if (cchActualSize > 0)
  20:     {
  21:         szOutput = ref new Platform::String(output);
  22:     }
  23:     else
  24:     {
  25:         szOutput = ref new Platform::String();
  26:     }
  27:     HeapFree(GetProcessHeap(), 0, output);  // fix: release buffer reference to fix memory leak.
  28:     return szOutput;
  29: }

 

Getting the byte array from an HttpWebRequest and Testing the code

To test the two methods I wrote a managed function which takes a string representing the Uri where the web resource can be found and a uint for expected code page.  If the code page is 1251 I call the managed method otherwise I call the native method.

  1: private void MakeRequestUsingHttpWebRequest(string szResourceUri, uint CodePage)        
  2: {
  3:     HttpWebRequest webRequest = WebRequest.CreateHttp(szResourceUri);
  4:     webRequest.Method = "GET";
  5:     webRequest.BeginGetResponse((evt) =>
  6:         {
  7:             string szContent = null;
  8:  
  9:             try
  10:             {
  11:                 HttpWebResponse resp = (HttpWebResponse)webRequest.EndGetResponse(evt);
  12:  
  13:                 Stream strm = resp.GetResponseStream();
  14:                 BinaryReader br = new BinaryReader(strm);
  15:                 
  16:                 // get Content Length so we know how much to read.
  17:                 int readSize = (int)resp.ContentLength;
  18:                 byte[] buffer = br.ReadBytes(readSize);
  19:  
  20:                 if (CodePage == 1251)                        
  21:                 {
  22:                     // the function StringFrom1251 is written in managed code and 
  23:                     // uses a static lookup table to convert from the binary value
  24:                     // to a unicode character.
  25:                     szContent = StringFrom1251(buffer);
  26:                 }
  27:                 else 
  28:                 {
  29:                     // This an alternate method uses a custom helper library written in c++
  30:                     // which uses the Win32 API: MultiByteToWideChar to convert the string.
  31:                     // This assumes the native OS supports the specified code page.
  32:                     szContent = NativeHelper.StringConverter.GetUnicodeString(CodePage, buffer);
  33:                 }
  34:                     
  35:             }
  36:             catch (Exception ex)
  37:             {
  38:                 szContent = String.Format("EndGetResponse threw exception: {0}", ex.Message);
  39:             }
  40:  
  41:             // This code completes asynchronously so the so here I send
  42:             // the resulting string to a function which will show the result.
  43:             Dispatcher.BeginInvoke(() =>
  44:             {
  45:                 UpdateStatus(szContent);
  46:             });
  47:  
  48:         }, webRequest);
  49:  
  50: }

Upon completion my code calls a function called “UpdateStatus” which simply puts the resulting string into a TextBox control on my page, ex:

  1: private void UpdateStatus(string szMessage)
  2: {
  3:     if (!string.IsNullOrEmpty(szMessage))
  4:     {
  5:         textBlock1.Text = szMessage;
  6:     }
  7:     else
  8:         textBlock1.Text = "[null]";
  9: }

 

I have left some peripheral actions out of these examples in order to minimize clutter which might distract from the key objective.  There are several things you, the reader, could do to expand upon this code such as checking the ContentType or ContentEncoding headers in the response for hints to the character set, instead of or in addition to, specifying the content type ahead of time.  You could also use HttpClient instead of HttpWebRequest.

 UPDATE: I corrected some logic errors in the native code sample related to the "output" string.

FYI: Don’t forget to follow the Windows Store Developer Solutions team on Twitter @wsdevsol. Comments are welcome, both below and on twitter.