Building CHM using CHMBuilder

Please see my blog about Sandcastle September 2007 release. With this release, we plan on shipping a new tool called CHMBuilder for CHM generation using Sandcastle.

What is CHMBuilder?

CHMBuilder is an executable that will be shipped under the Production Tools folder of Sandcastle. The HXS generation process in Sandcastle works much better than the CHM generation process. There are several reasons for this, but the central problem is that CHMs do not get the data to build TOC and indexes from the topic HTML files themselves. We have transforms under Production transforms folder that collect this data, but they are very slow ( https://blogs.msdn.com/sandcastle/archive/2007/02/28/sandcastle-performance-improvements-in-february-ctp.aspx) and are not localized

Given HXS-ready Sandcastle output, ChmBuilder will produce HHP, HHC, and HHK files, and transform the HXS-ready topic files to CHM-ready topic files. Using CHMBuilder the user would no longer have to do separate BuildAssembler runs to generate CHM’s and HXSs. This also solved the internal scenario for our team where we build CHM and HxS for many products.

CHMBuilder Usage:

ChmBuilder /html:htmlDirectory /toc:tocFile /project:projectName /lcid:languageId /out:outputDirectory

/html:htmlDirectory

Specify a html directory.

/project:projectName

Specify a project name.

/toc:tocFile

Specify a toc file.

/lcid:languageId

Specify a language id. If unspecified, 1033 is used. See chmbuilder.config for the list of languages.

/out:outputDirectory

Specify an output directory. If unspecified, CHM is used.

On input, htmlDirectory is a directory containing HTML topic files with XML data islands containing HXS metadata. The tocFile is the manifold TOC file, produced either using either the ReflectionToToc.xsl or DsTocToToc.xsl transforms.

Upon completion, the outputDirectory contains the files projectName.HHP, projectName.HHC, projectName.HHI, and a directory with the same name and as htmlDirectory containing topic files with the same names and contents as the input topic files, but stripped of HXS-specific elements.

CHMBuilder Details:

1. The HHP file produced references the HHC and HHK files produced, and the output topic directory. HHP file is generated with template read from config. Three items are replaced for a given project; they are {projectName}, {defaultTopic} and {language}.

2. The HHK entries are produced using the Term attributes on all <MSHelp:Keyword Index=”K” /> elements within the XML data island of all the topics.

3. The HHC file is produced using the structure defined in the tocFile, taking titles from the input topic files. If an <MSHelp:TOCTitle /> attribute is present, that title is used. Otherwise, the value in the HTML <title> element is used.

4. The output topic files do not contain the XML data island. All <MSHelp:link>linkText</MSHelp:link> elements are transformed into <span class=”nolink”>linkText</span>. Any other elements in the MSHelp namespace are removed. The topic documents are otherwise identical.

5. The input topic files are read with an XmlReader, and the output topic files are produced with an XmlWriter, for speed and low memory overhead.

6. The tool also successfully transforms topic files that do not appear in the TOC.

7. The tool supports htmlDirectorys with a directory substructure. The processing goes recursively through subdirectories and reproduces this structure in the output directory.

8. The tool supports Localization of TOC and Index. For HHK file, the encoding is “codepage” read from config file. For HHP file, the language option is set to “name” attribute from config.

9. The tool supports indented second level index. If there is a comma in K keyword, then only the second half of it will appear in index. Also this index entry is indented.

10. It also converts some special characters in index, eg: %3c (‘<’).

CHMBuilder Config file

<? xmlversion = "1.0"encoding="utf-8" ?>

< configuration >

  <!-- The languages section here is to support TOC and Index localization .-->

  <!-- We can add more languages here .-->

  < languages >

    < languageid = "1033"codepage="65001"name="0x409 English (United States)" />

< languageid = "2052"codepage="936"name="0x804 Chinese (PRC)" />

  </ languages >

  <!-- {0}:projectName, {1}:defaultTopic, {2}:Language -->

  < hhpTemplate >

    < line > [OPTIONS]</line>

    < line > Compatibility=1.1 or later</line>

    < line > Compiled file={0}.chm</line>

    < line > Contents file={0}.hhc</line>

    < line > Index file={0}.hhk</line>

    < line > Default Topic={1}</line>

    < line > Full-text search=Yes</line>

    < line > Language={2}</line>

    < line > [FILES]</line>

    < line > icons\*.gif</line>

    < line > art\*.gif</line>

    < line > media\*.gif</line>

    < line > scripts\*.js</line>

    < line > styles\*.css</line>

    < line > html\*.htm</line>

    < line > [INFOTYPES]</line>

  </ hhpTemplate >

</ configuration >

Cheers.

Anand..