CHM Localization and Unicode issues – dbcsFix.exe

In this thread ( we discussed about the localization issues in CHM. With September Sandcastle release we have addressed the Unicode issues for CHMs built in East Asian languages. Here’s the problem:


1.    For localized CHMs the sources need to be in ANSI if the language’s characters don’t all map to Western-1252.  

2.    If you compile ANSI sources, the HTML help compiler assumes that the HTML codepage is your current system codepage.  If your system is set to EN-US ( like my system), the resulting CHM contains incorrect characters, unless you change my system settings (which require a reboot).


To resolve this, one must:

1.    Note the codepage in the source HTML (a META tag).

2.    Re-encode all files in ANSI, using the appropriate code page.

3.    Trick the OS in to stating its current codepage is something different than what it really is.

4.    Compile.


Here’s our solution:

1.    Have ChmBuilder write the codepage as UTF-8 into the HTML as it generates them (they actually are UTF-8 at this point).

2.    Re-encode the files using dbcsFix.exe. DBCS stands for Double Byte Character Set and we use this program to convert UTF-8 to ANSI . While doing so, substitute the actual codepage (e.g., big5) for what was initially written (UTF-8).

3.    Wrap the call to HHC.exe in a call to MS APPLocale or SbAppLocale.exe, passing in the appropriate LCID.


dbcsFix.exe Details:

dbcsFix.exe attempts to work around limitations in the CHM compiler regarding character encodings and representations. Specifically:

1.    Replaces some characters with ASCII equivalents, as follows:

Char name

utf8 (hex)


Non-breaking space


" " (for all languages except Japanese)

Non-breaking hyphen



En dash



Left curly single quote



Right curly single quote



Left curly double quote



Right curly double quote



Horizontal ellipsis



After this step, no further work is done when LCID == 1033.


2.    Replaces some characters with named entities, as follows:

Char name

utf8 (hex)

named entity




Registered trademark



Em dash







3.    Replaces the default "CHARSET=UTF-8" setting in the HTML generated by ChmBuilder with "CHARSET=" + the proper value for the specified LCID, as determined by the application's .config file


4.    Re-encodes all input HTML from their current encoding (UTF-8, as output by ChmBuilder) to the correct encoding for the specified LCID.



dbcsFix.exe [-d=Directory] [-l=LCID]

-d is the directory containing CHM input files (e.g., HHP file). For example, 'C:\DocProject\Output\Chm'. Default is the current directory.

-l is the language code ID in decimal. For example, '1033'. Default is '1033' (for EN-US). Usage is also available with -?


After processing the inputs with dbcsFix.exe, the call to the CHM compiler must be made when the system locale is the same as the value set when calling this tool. This can be done either by changing your system settings via the control panel, or by MS APPLocale or by SbAppLocale.exe. In the latter case, the call should be similar to:

SbAppLocale.exe $(LCID) "%PROGRAMFILES%\HTML Help Workshop\hhc.exe" Path\Project.HHp



Here are some useful links about Unicode and general encoding issues:

Sincere thanks to my colleagues John Carl and Justin Russell for developing dbcsFix.exe. Cheers.



Comments (6)

  1. Dave Sexton says:

    Thanks for the info on dbcsFix.  And BTW, nice job guys on the dnrTV video – I learned some stuff from it 🙂

  2. Sandcastle says:

    I am excited to announce the availability of September 2007 version for Sandcastle. The latest version

  3. Rob Chandler says:

    Please note that MS AppLocale has moved to this location…/AppLocale.aspx

  4. Rob Chandler says:

    Oh I take that back sorry. That page also has a bad link. So where has MS AppLocale gone? Maybe deleted because of DoJ — Like other stuff that quietly disappears because MS don't have the rights to make the source available?

  5. MS AppLocale didn't get removed – just sucked up into larger systems. 🙂 You can download it from here:…/details.aspx

Skip to main content