Converting Office documents to Open XML

XML-based file formats have been getting a lot of attention lately, but the vast majority of business documents worldwide are still in the binary Microsoft Office formats: DOC, XLS and PPT. And that will still be true for quite a while, because there is such a huge volume of Office documents in use that they can't all be converted overnight. in this post, I'm going to cover some of the options for converting those binary documents to Open XML.

Converting Individual Documents

For an individual document, it's pretty simple. If you have Office binary documents, then you're probably using Office already. So when you upgrade to the 2007 version, you can simply open a document -- the binary formats are still fully supported -- and then save it in the new Open XML formats, which are now the default.

For users who haven't upgraded to the 2007 version of Office yet, but need to share documents with others who have, the solution is the Compatibility Pack. This free download adds read/write support for the Open XML formats to Office 2000, Office XP, or Office 2003.

Bulk Conversions

If you have more than a few documents to convert, you'll want to use a more automated approach. The solution in this case is the Office File Converter (OFC), which is included in the Office Migration Planning Manager (OMPM), another free download.

OMPM does a lot more than convert file formats. For example, it includes tools for scanning documents and generating reports, and it's designed for manual use or deployment via SMS for large-scale automated migrations. But for this post I'm going to just focus on the file-conversion functionality of OMPM. For more information on its other features, see the documentation that comes with the download.

After you install OMPM, you'll have a command-line utility named OFC.EXE. This is the Office File Converter.  The basic concept is that you edit an ofc.ini file that contains settings for what you'd like to convert and how to handle various details, then you run the OFC executable. OFC just takes one command-line parameter, which is an optional alternate location for the ofc.ini file -- by default it looks for it in the current folder.

An ofc.ini file is provided as a starting point, so you can edit copies of that for the various things you want to do. Here's a look at what's in the provided ofc.ini file:

 [Run]
; folder where the log files will be written
LogDestinationPath=C:\OMPMLogs

; free-form text description
Description=none

; folder containing the File List exported from the Reporting tool
; (do not include if specifying FoldersToConvert)
;FileListFolder=dataexport

; Internal number
ToolId=1

[ConversionOptions]
; Set to 1 to have Word leverage the "full upgrade on open" flag
FullUpgradeOnOpen=1
; Do not CAB (compress into a folder) the log files
DoNotCab=0
; Set to 1 the converter will not inlcude macros in the converted files
MacroControl=1

[FoldersToConvert]
; folders to convert - the Converter will attempt to convert all files in the specified folders
fldr1=C:\Documents and Settings\Administrator\My Documents\Beta 2 ORK Docs

[ConversionInfo]
; specifies the way the destination folder structure will be created.
; SourcePathTemple = *\ - You indicate by the number of *\*\..., the number of directories 
;    from the source path that you want to omit from the final Destination Path.
; DestinationPathTemplate = c:\output\*1\*2 - Folders in the Source Path are included in the
;    DestinationPath by specifying the number of the folder - *1=the first folder, *2=the second etc.
;    The remaining source path is then added
; e.g.
;
; SourceTemplate = *\*\*
;
; DestTemplate = \\server\share\*3\*1
;
; Will produce \\server\share\subdir\remoteserver\subsubdir\filename from 
; \\remoteserver\sharexxx\subdir\subsubdir\filename

SourcePathTemplate=*\*\*\*\
DestinationPathTemplate=*1\*2\*3\*4\Converted

As you can see, the settings are fairly self-explanatory, with some comments provided as well. Note that there are two different ways to use the tool: you can either provide a list of files to convert (as generated by another tool that comes with OMPM), or you can specify a set of folders to convert, and then OFC will convert every file in each of those folders.

Programmatic Conversion

One common request I've heard from developers goes something like this: "I'm writing code that reads Open XML documents, but I'd also like to support the binary formats -- how can I convert a binary document to Open XML on the fly?"

The OFC utility can help you out here as well. It's just an EXE, so you can invoke it programmatically in various ways, depending on your development environment. For example, in .NET development, you can use the Process class to launch OFC.EXE and then pass it appropriate command line arguments to point at your ofc.ini file. There are a couple of strategies you may want to consider.

The first strategy would be to have standardized locations for your conversion in-box, conversion out-box, and the location of the ofc.ini file. Then you would copy a document into the in-box, invoke OFC.EXE (with a command-line reference to your standardized ofc.ini location), and look for your converted document in the out-box folder.

A second option would be to dynamically generate an ofc.ini file that contains the appropriate settings for what you want converted, then invoke OFC.EXE with that ofc.ini file. This is a little more flexible, but you have to manage the creation and deletion of the ofc.ini file, and you may also want to manage creation and deletion of a file list as is generated by the OMPM scanning tool.