Java and Office documents via XML, XSL-FO


Couple of interesting items related to Interop with Java and Office docs.


1st, MSDN article on saving MS-Word into XSL-FOOleg commented on it very favorably. In essence, you can use MS-Word as an XSL-FO designer. That’s a huge step forward.


2nd, an old, but new to me, article on using Java to generate WordProcessingML.  I picked this up from John R. Durant, which I found while surfing my new favorite XML-related page… which is…


3rd, TopXml’s blog aggregator.  High signal-to-noise ratio there.


So. what does it all mean?  Well, item 2 stands on its own.  That’s a nice capability.   In any Java app, you can generate a document that conforms to the published XML schema for MS-Office docs, produce Office docs (reports, memos, whatever), and then ship them via a webservice to a client, where they can be consumed – printed, viewed, whatever. In this scenario, there is no use of Office on the server side.  It’s Just XML, so it could be done on any any modern platform.  [ Do mainframes speak XML?  Can I write a CICS TP that generates an XML document?  Hmm, I don’t think I would want to do that. . . ]


2ndly, combining item 1 and 2 means that, if I for some reason don’t want to use WordML, I could run the output through the RenderX XSL sheet mentioned in item 1, and generate an XSL-FO doc. 


There is a license for the WordProcessingML stuff, but it is available free of charge.  I don’t know the license for the RenderX stylesheet but it is available for download at the MSDN article in question.   Cool possibilities. . . 


Ok, sure you could have been using Apache FOP as well, but … it is really a pain to design XSL-FO docs manually, or programmatically starting from nothing.  This combo allows you to use Word as the visual forms designer during development, then at runtime, use any XML-aware platform (like Java) to fill in blanks in the XML template foc, and transform to XSL-FO.  This is a big step forward.


 

Comments (8)

  1. Is it really possible to use any Java application to dynamically generate MS-Word files, complete with graphics, tables, text styles, fonts, and more? Yes, quite possible. And Would you believe? – it’s easy too!

  2. In the past I’ve posted some articles [ 1 , 2 ] about generating Office 2003 documents from a server-side

  3. Is it really possible to use any Java application to dynamically generate MS-Word files, complete with graphics, tables, text styles, fonts, and more? Yes, quite possible. And Would you believe? – it’s easy too!

  4. Tapan says:

    Firstly great article.

    Is there any way i can verify that this is a MS word document(.docx), what would be the name of the .xml file i am to look into to get this information(i assume this information would be a schema element defined in one of the .xml files).I need to verify it is in the correct format.

    Thanks

    Tapan

  5. DotNetInterop says:

    Tapan – there are two different formats being discussed here.  First, the .docx file format is a zip file, with a particular, well-defined internal structure.  The .xml file I spoke of in this posting is different – it is the WordML format which pre-dated .docx by at least 2 years.

    This particular post talks about how to format an .xml WordML document.  This particular post does not talk about how to produce a .docx from Java.  That is also possible, and is something I considered writing some example code for, it is not something covered here.

    Now, your question has to do with querying and validating a .docx file, which is another thing entirely.

    I’d suggest you look elsewhere for that. There is a System.Packaging namespace in the .NET base class library as of .NET 3.0 – it will help you if you are using .NET.  If you are using Java, you will have to roll your own, I think.

  6. Pavan says:

    Hi,

     I want to know how can convert WordXML to XSL FO by my own XSLT script? I mean is there any parser which can convert WordXML to XSL FO using custom XSLT ?

    – Pavan

  7. Anonymous says:

    I would like to know if one can import existing XSL-FO stylesheets into Microsoft Word and do the design changes without affecting the data set binding??