CP 951 & HKSCS


(Note that this happened before I started owning code pages, so I might be a bit confused about parts of this).

Support for HKSCS is now through Unicode 5.  There’s a download available, Microsoft Character Code Conversion Routines For HKSCS-2004, that has some routines to help applications convert HKSCS data encoded in Big 5 or the Unicode PUA to Unicode 4.1.

Previously we provided Hong Kong Supplementary Character Set (HKSCS) Support for Windows XP Platform which was effectively a replacement for the 950 code page with mappings to the Unicode PUA.  Sometimes this was called code page 951 because of the name of the replacement file in the download.  This is sort of a hack and doesn’t work on Vista, you’ll get a compatability warning if you try to install it, and if you force the issue I’m not gonna support whatever happens to your machine! 🙂

The safest bet for applications developers to have consistent support is to use Unicode, which supports the HKSCS characters now, so you don’t have to worry about the PUA font mappings or other issues that have caused data portability issues between versions or systems.

Comments (5)

  1. Abel Cheung says:

    Thanks for your comment on my site. IMHO going with Unicode for HKSCS support is a wise move. Previous standalone patch is not as useful as it should be. The reason is two fold:

    1. Not publicized. Thus most Hong Kong people actually don’t know this thing has ever existed.

    2. Bad comment from Taiwan people, who install this patch and clobbered their own self-made characters and other custom patches which extend Big5.

    Next logical step here would be to create an easy-to-use tool for converting file content in Big5 or Big5HKSCS (Word, PowerPoint, text file, whatever) and file names into Unicode, to help migration to Unicode and encourage its usage. Most people who don’t want to abandon their legacy encoding gives migration cost as their first reason of unwillingness (not only traditional Chinese, but true across at least throughout Asia languages). And not in form of (currently released) API package.

    Indeed, giving the tiny Hong Kong market, is it really justified to spend development effort on such tool? That’s something to be considered as well.

  2. Shawn Steele says:

    The "Conversion Routines" were provided for HKSCS partially because of the reasons you note.  Creating a tool has several problems, which you mentioned, or alluded to:

    * Many users don’t know how their data is encoded.  The HKSCS support, PUA use, etc. have many levels of complexity that can cause problems.

    * The use of the PUA is different for different users, so a generic tool is difficult to develop.

    * There are numerous file formats, making a conversion tool challenging since it can’t know about propriatary formats and probably would miss other ones that are important to particular customer applications.

    * Even with a perfect tool, migration to clean Unicode data is challenging since it requires that all users of the data migrate.  It can be challenging if some users may not have the desire or ability to upgrade.  A company could migrate, but what about their partners and customers?  A software program could provide updates, but the companies may not be ready for them yet.

    Unfortunately there isn’t a trivial solution to this mix of encoded data, and each customer needs to evaluate their usage and come up with a plan to best migrate their data toward Unicode 4.1/5.

  3. (As a point of information, both the fourth and ninth most important realizations of my entire life happened

  4. (As a point of information, both the fourth and ninth most important realizations of my entire life happened