Java and MS-Word


Java and MS-Word – followup


Earlier this month, I posted some references to some Java->WordML interop material. This is a followup.


I proved to myself that it is pretty easy and straightforward to use Java to dynamically create MS-Word documents, conforming to the WordProcessingML schema. Anyone can do this, using the schema documentation, an XML-aware Java application platform.


To use this approach, a developer really needs to have a working installation Word 2003 for the development or design stage: to design the document and generate the initial XML, and you need Word 2003 to verify that what you are producing is a valid WordProcessingML document.


How did I do it?


You all know that Microsoft Word (and other Office applications) can load and save XML, and you know the schema is published by Microsoft.


The XML phreaks out there, maybe they like to wake up in the morning, drink 7 cups of starbucks’ best, look at a schema, and start coding angle brackets. Not me. Given an XML schema of reasonable complexity, I have little hope of independently generating an XML document that conforms to that schema, within my lifetime. So what I did was use MS-Word as the designer. I just wrote a document. Anybody can do that. I designed the document exactly as I wanted it. Then File… Save As…. XML. Boom, I have a template document that conforms to WordProcessingML.


From that starting point, I took 2 paths. The first was to just place within that Template document keywords or fields to be replaced programmatically at runtime, with a simple text replacement library. In Java, the java.lang.String class has a replaceAll() method that accepts regular expressions and inserts replacement text. Easy. I just inserted a set of “fields” that look like ##NAME##. These are not MS-Word “fields”, just plain old text, within the XML document, of a well-known format. You can use any format you like. $$NAME$$ if you want, or whatever.


The Java application then populates a Hashtable of name/value pairs, then mechanically replaces all the fields in the doc whose names are present in the Hashtable, with the value of that key. Simple. Find ##FOO## in the doc, and replace that with Hashtable.get(“FOO”). The Hashtable can be populated by any means – I inserted the current time of day as one of the name/value pairs, and I also populated the list with data from a SQL query. It could also be populated from a webservices call. Whatever. It’s just a Hashtable.


After replacing the “fields”, the result was a legal WordProcessingML document, dynamically-generated from data. Load that doc into MS-Word, print it, whatever. Easy.


The second path I took was more XML-ish. My data source was an XML document. All data, including current time of day, and anything you might retrieve from a database, gets formatted into an XML document. You choose the schema. This doc could be obtained via a webservices call, from a database query (SQL Server and other databases can return XML documents in response to queries) or just formed in memory. I took the latter approach. Anything will do.


I then de-constructed the template XML document, and formed it into an XSL transform that could accept the XML data document, and again, produce a WordProcessingML document. Then it is a simple matter of applying the XSL transform programmatically, at runtime. This requires at least Java 1.4, which you all should be using anyway because it is more current with security fixes. Also you should take this route only if you are comfortable with XSL. It is hairy for some people.


Either path – the template version or the XSL transform – produces the same result: a valid WordProcessingML document. Either works for standalone applications or in web applications.


In Action


Those of you who are familiar with XML technologies won’t be surprised to learn that it just works. But even so, the ability to dynamically generate a rich Word document, with images, text formatting, tables, and so on, all from Java, may open up some possibilities for you. Check it out for yourself. Here’s a working example that uses a JSP to dynamically generate a document file. You should have MS-Word installed on your PC if you want to see the result.


Next up


I didn’t try the XSL-FO route or the RenderX stylesheet I mentioned in my previous post. Also I did not try to slurp up documents with custom-schema into Word. And I didn’t transmit the XML documents over webservices. I may explore some of these things in the future. Anyone have any other ideas?


Let me know what you think!


Here’s the example, including links to source code.


Enjoy.
-Dino

Comments (30)

  1. Dino Chiesa of Microsoft shows how to generate dynamically WordML documents using Java and XSLT. Yep, that’s not a typo, Microsoft, WordML and Java. XML serves as peacemaker again. And he even provides a working JSP demo. Cool….

  2. If you are taking the "Replace All" approach, such as in CreateOrderConfViaTemplate.java, the value you insert into the XML should be XML-encoded.

    For example, the following characters (spelled out) must be escaped:

    "less-than"

    "greater-than"

    "apostrophe"

    "double-quote"

    "ampersand"

  3. Dino says:

    Good point Martin. I’ve updated the examples. Thanks.

  4. Gunther V says:

    I need to convert a generated WordML document to a .doc-file. Does somebody know how to do this? I would prefer a Java solution, but .NET solution is OK too.

  5. DotNetInterop says:

    @Gunther,

    to do that you could just automate MS-Word in .NET, open the WordML file, then SaveAs.

    There are examples of how to automate office in the .NET SDK install.

  6. rash says:

    Can we achieve mail merge functionality of word with xml data with this approach?

  7. Ian Brandt says:

    @Gunther, Dino,

    A Java WordprocessingML to Doc converter sure would be nice though. I’m a Mac user. I paid half a grand for Office Pro, but Word 2004 doesn’t do XML. I have to buy yet another copy of Word, 2003, and run it in Virtual PC, and I can’t script the conversion from the OS X side. Where’s the inter-op in that? In the future I really hope to see full support of WordprocessingML in all versions of Word so that someday we can actually distribute documents in that format, but until then a portable wordml2doc converter would be a good thing for all.

  8. Mamun Chowdury says:

    Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code.

    Thanks in advance,

    Mamun

  9. DotNetInterop says:

    @Mamun,

    Sorry, the quality of service on that machine is a little low. it was sitting on an old laptop that had some power problems. I’ve since migrated it to a newer machine. the link ought to work now?

    http://dinoch.dyndns.org:7070/WordML/

  10. A while back, the OpenXmlDeveloper.org website offered an example of how to create a WordProcessingML

  11. In the past I’ve posted some articles [ 1 , 2 ] about generating Office 2003 documents from a server-side

  12. dionazani says:

    You can use Rtf Writer2 to write rtf and open in Word or OpenOffice (Writer) …

  13. Mathieu says:

    Hi, I was trying to view and download the example that you said about generating word file in Jsp. Unfortunately the link was not working. Will it possible to email me the example with source code.

    Thanks in advance,

    Mathieu

  14. Subbu says:

    Hi,

    I am looking for java code/utility to check if a given MS Word document has track changes ON or not.  

    Any help is appreciated..

  15. BOng says:

    Your source code links aint working 15/05/2009

  16. DotNetInterop says:

    Yes, my server is down and cannot get up!  Sorry!

  17. ghouse says:

    If you have any example on it please send it to my mail id.

    Thanks

    Ghouse

  18. Zdravko says:

    Plsease send me the example source code to zrosko@yahoo.com

  19. Alex says:

    Please send the example source code to facp77@live.com.mx

  20. Wa says:

    I am doing some similar job and need help. please send the example to wa0805@hotmail.com if u can. thanks

  21. yasharth mishra says:

    Hi, I am trying to do a similar job, but need help with with tables and images….how to use data in the XML file to populate a word table? and similarly  how to get word to load image from a link…

    anyone has a working example??

  22. James says:

    Hi, I am trying to do something very similar, would you mind sending me the example? james-snell@hotmail.co.uk …thanks! James

  23. Pavan says:

    Hi,

    I’m trying to makeout similar one to this. Can you mail your source code to paonethestar@gmail.com to take that as starting point.

    Thanks,

    Pavan

  24. Raluca says:

    It’s a shame that the source code is no longer available. Could you please send it to me at raluca.stanculescu@gmail.com

  25. Juliano says:

    Dino, can you mail me your code?

    tks

  26. narayanf1 says:

    Dino, please send the source code to narayanf1@gmail.com.

    would be nice if you could share it on some website, since we don't expect you to email whenever someone asks you here 😛

  27. sraju says:

    Good solution Dino,

    Can we convert a .DOT (a word template file) to a .DOC (ms word document file) programmatically by filling some values at given places.

    Ex: my template would have an Attribute Display Name: Attribute Value, the program should fill these values when I pass some array of values or name, pair, etc…

    Let me know if such thing is achievable mostly in Java, C or C++.

    Post your solution to rnvssudheer@hotmail.com.

    Thanks,

    Sudheer

  28. Mohamed says:

    Hey, this is brilliant! I can't download the source code so I'm doing some fill-in-the-blanks. I just need to have a peek at a working piece of code. Please email it to me at mvariyawa@yahoo.co.uk

    I coded up to the Hashtable, now I'm just trying to figure out how replace the "fields" I created. How did you read…what did you read…did you read one line at a time from the xml template?

    Please assist.

    Thanks

    Mohamed