Java and MS-Word

Java and MS-Word - followup

Earlier this month, I posted some references to some Java->WordML interop material. This is a followup.

I proved to myself that it is pretty easy and straightforward to use Java to dynamically create MS-Word documents, conforming to the WordProcessingML schema. Anyone can do this, using the schema documentation, an XML-aware Java application platform.

To use this approach, a developer really needs to have a working installation Word 2003 for the development or design stage: to design the document and generate the initial XML, and you need Word 2003 to verify that what you are producing is a valid WordProcessingML document.

How did I do it?

You all know that Microsoft Word (and other Office applications) can load and save XML, and you know the schema is published by Microsoft.

The XML phreaks out there, maybe they like to wake up in the morning, drink 7 cups of starbucks' best, look at a schema, and start coding angle brackets. Not me. Given an XML schema of reasonable complexity, I have little hope of independently generating an XML document that conforms to that schema, within my lifetime. So what I did was use MS-Word as the designer. I just wrote a document. Anybody can do that. I designed the document exactly as I wanted it. Then File... Save As.... XML. Boom, I have a template document that conforms to WordProcessingML.

From that starting point, I took 2 paths. The first was to just place within that Template document keywords or fields to be replaced programmatically at runtime, with a simple text replacement library. In Java, the java.lang.String class has a replaceAll() method that accepts regular expressions and inserts replacement text. Easy. I just inserted a set of "fields" that look like ##NAME##. These are not MS-Word "fields", just plain old text, within the XML document, of a well-known format. You can use any format you like. $$NAME$$ if you want, or whatever.

The Java application then populates a Hashtable of name/value pairs, then mechanically replaces all the fields in the doc whose names are present in the Hashtable, with the value of that key. Simple. Find ##FOO## in the doc, and replace that with Hashtable.get("FOO"). The Hashtable can be populated by any means - I inserted the current time of day as one of the name/value pairs, and I also populated the list with data from a SQL query. It could also be populated from a webservices call. Whatever. It's just a Hashtable.

After replacing the "fields", the result was a legal WordProcessingML document, dynamically-generated from data. Load that doc into MS-Word, print it, whatever. Easy.

The second path I took was more XML-ish. My data source was an XML document. All data, including current time of day, and anything you might retrieve from a database, gets formatted into an XML document. You choose the schema. This doc could be obtained via a webservices call, from a database query (SQL Server and other databases can return XML documents in response to queries) or just formed in memory. I took the latter approach. Anything will do.

I then de-constructed the template XML document, and formed it into an XSL transform that could accept the XML data document, and again, produce a WordProcessingML document. Then it is a simple matter of applying the XSL transform programmatically, at runtime. This requires at least Java 1.4, which you all should be using anyway because it is more current with security fixes. Also you should take this route only if you are comfortable with XSL. It is hairy for some people.

Either path - the template version or the XSL transform - produces the same result: a valid WordProcessingML document. Either works for standalone applications or in web applications.

In Action

Those of you who are familiar with XML technologies won't be surprised to learn that it just works. But even so, the ability to dynamically generate a rich Word document, with images, text formatting, tables, and so on, all from Java, may open up some possibilities for you. Check it out for yourself. Here's a working example that uses a JSP to dynamically generate a document file. You should have MS-Word installed on your PC if you want to see the result.

Next up

I didn't try the XSL-FO route or the RenderX stylesheet I mentioned in my previous post. Also I did not try to slurp up documents with custom-schema into Word. And I didn't transmit the XML documents over webservices. I may explore some of these things in the future. Anyone have any other ideas?

Let me know what you think!

Here's the example, including links to source code.

Enjoy.
-Dino