How do I get HKSCS 2004 characters from Big-5 in .Net?

Article
05/03/2007

Well, that's pretty tricky. We provide the Microsoft Character Code Conversion Routines For HKSCS-2004 functions, but those are intended for use with unmanaged code.

The fundemental problem is that these "HKSCS" characters were in use prior to the assigment of a code point for them in Unicode. In order to support them, we mapped Big 5 / Code Page 950 HKSCS characters to the Unicode Private Use area. So now there is data with these code points in the PUA and in Big 5, AND at the Unicode 5.0 code points. The expectation is to use Unicode long term, so these functions were provided to help map old data to the new Unicode 5 code points.

Another way for a managed application to solve this problem would be to create your own Encoding and map the Big 5 code points to their new Unicode code points instead of the old code page 950 mappings. It is nearly impossible for Microsoft to provide a patch to do this because some users have data in the old PUA code space and their applications would break if the data was suddenly migrated to the assigned HKSCS code points without them opting in. Eventually "all" the interesting data should be migrated from the PUA code points to the Unicode HKSCS code points, but until then the problem remains.

The code samples and links from the "Microsoft Character Code Conversion Routines For HKSCS-2004" document would be a good starting spot to generate the necessary mappings to make an Encoding that moved code page 950 data to the new HKSCS code points.

How do I get HKSCS 2004 characters from Big-5 in .Net?

Additional resources