Pitfalls of Chinese Conversion (Part 1)

After basic introduction of software Globalization from my team mates, I would like to talk about something more specific in this entry.

Kernel32 is one of the DLLs (Dynamic-Link Libraries) supported by Microsoft Windows. It is a collection of code which can be used by multiple processes while being loaded into memory. Kernel32.dll has provided APIs (Application Programming Interface) to convert East Asian characters, for example, to convert Simplified Chinese to Traditional Chinese or vice versa.

We can call the LCMapString API provided by Kernel32.dll in a small WinForms application to perform the conversion task: (Please be noted, LCMapStringEx API is a newer version of LCMAPString API. It can also perform the conversion task, but it can be only supported in Windows Vista or later version!)

        public static string ToSimplified(string stringSource)

        {

            String stringTarget = new String(' ', stringSource.Length);

            int iReturn = LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_SIMPLIFIED_CHINESE, stringSource, stringSource.Length, stringTarget, stringSource.Length);

           

            return stringTarget;

        }

The ToSimplified function’s input parameter is the source string to be converted, and the return value is the converted string. In this case, the source string is the Simplified Chinese text, and the output string is the Traditional Chinese text converted from the source.

If we change the flag value (the second parameter) of LCMapString API from LCMAP_SIMPLIFIED_CHINESE to LCMAP_TRADITIONAL_CHINESE, we can build another function to convert a source string from Traditional Chinese to Simplified Chinese! It is a convenient API, isn’t it?

        public static string ToTraditional(string stringSource)

        {

            String stringTarget = new String(' ', stringSource.Length);

            int iReturn = LCMapString(LOCALE_SYSTEM_DEFAULT, LCMAP_TRADITIONAL_CHINESE, stringSource, stringSource.Length, stringTarget, stringSource.Length);

            return stringTarget;

        }

However, the API does have a limitation; it can only convert the Chinese string in a character-by-character manner without factoring in the context meaning. It does create some conversion mistakes. How does that happen?

I will explain more on this in my next entry! Stay tuned!