Open XML made easier for Java developers with Apache POI

[05/18- Update:
Apache POI project is highlighted in today's Document Interoperability Inititice (DII) event that just happened in London ]

When developers are tasked to deal with document file formats it might be challenging to do the right thing if you don’t have a good experience with a particular format, and need to crack it open and understand all the details.

For Java developers and Microsoft Office file formats there’s a very interesting solution with the Apache POI project, which provides a Java API to access Microsoft Office formats. Last year Microsoft and Sourcesence announced that they would collaborate to add support of the Open XML file format to the Apache POI project, and the resulting Open XML support has been integrated as part of POI 3.5 beta 5.

The end result: Good news for Java developers who need to manipulate the Office Open XML files (.XLSX, .DOCX, .PPTX), because it really makes it easier for them to do the job!

To illustrate the point, let me walk you through a demo scenario that uses Apache POI Java Libraries and actually combines it with the PHPExcel project (for PHP developers) and the Open XML Format SDK 2.0 (for .NET developers). My goal is just to give you a sense of the type of scenarios you can easily develop using multiple languages and multiple platforms.

We will make that demo available with more explanation in an article on http://openxmldeveloper.org/. Before we get into the demo itself I want to thank Julien Chable and Maarten Balliauw for their help in building this demo.

For now, let me walk you through the scenario. For the sake of our demonstration we are going to show how raw data can be consumed by a Java web application using the Apache POI, to create an .XLSX file from scratch. How that file can then be accessed and modified by a PHP application (with PHPExcel). And finally how the resulting file can be digitally signed and finalized via the .NET framework using the Open XML Format SDK.

Here’s the data flow:

Dataflow

Step 1 of the scenario starts in the Java Web applications:

step1JavaApp

Once the “Create Spreadsheet” button is pressed, it creates the files:

step1JavaAppExcel

And does some processing to inject the initial XML data and formatting. The result looks like this:

step1JavaAppResult

Most of the Java code required to do this fits in this code snippet:

step1JavaAppSnippet

Step 2, moving to the PHP application, the UI is similar:

step2PHPApp

This step adds cell protection, renames the .XLSX file, changes cell formatting, and inserts additional content formatting. The result looks like this:

step2PHPAppExcel

And the code to accomplish it looks like this:

step2PHPAppSnippet

Step 3, finally, from the ASP.NET web applications using the Open XML Format SDK:

step3OXMLSDKAppExcel

Where the code for adding the digital signature looks like this:

step3OXMLSDKAppSnippet

Easy, don’t you think?
Stay tuned, as I said earlier, we will follow up on http://openxmldeveloper.org/ with a more detailed article.

Additional background on PHPExcel and the Open XML SDK:

The PHPExcel project is an open source project available on Codeplex. It consists of a set of classes for PHP that enables PHP applications to read and write to various file formats. These formats include HTML, PDF, and the relevant one for our demonstration…Excel 2007’s .XLSX format. This class set supports features such as setting spreadsheet meta data (author, title, description ...), multiple worksheets, different fonts and font styles, cell borders, fills, gradients, and adding images to spreadsheets. In parallel to this project, there is also the sister project PHPPowerPoint, which is intended to operate along similar lines as the PHPExcel application but with a focus on the .PPTX file formats. Both of these projects are built around the OpenXML standard, and the PHP framework. Read this nice article: Use PHP to create Open XML Spreadsheet reports

The Open XML Format SDK provides methods for .NET developers to access and manipulate XML content, including XML data contained in OXML document formatted files. It provides strongly typed part classes to manipulate Open XML documents. The SDK also uses the .NET Framework Language-Integrated Query (LINQ) technology to provide strongly typed object access to the XML content inside the parts of Open XML documents. The April 2009 CTP release also adds support for the validation of Open XML documents.
Read Brian Jones' blog to go deep on Open XML SDK. 

Jean-Christophe Cimetiere  - Sr. Technical Evangelist