Open-source API for SpreadsheetML documents

More people are starting to build and share open-source APIs around Open XML development.  There are now several of these projects on Codeplex and SourceForge, all sharing the same basic goal: making Open XML development as simple (and thereby productive) as possible.

For .NET developers, the System.IO.Packaging API handles the low-level heavy lifting chores of Open XML development: dealing with OPC packages, relationships, and content types. But once you have those details under control, you're going to have to write a lot of XML markup to create a useful document, especially one with rich formatting or one that takes advantage of new concepts like structured document tags (content controls), the custom XML data store, or mapping content to a custom schema.

The logical approach to writing all that XML markup is to create classes that abstract away the messy details so you only have to deal with them once. And that's exactly what these API projects are all about.

A great example is the new ExcelPackage project on Codeplex. It was created by Dr. John Tunnicliffe, and he wrote a nice article about it for the OpenXmlDeveloper web site: "Using the Packaging API and Office Open XML to create Excel 2007 files on the server."

John's approach mimics some of the syntax and concepts that have been used in the Excel object model to allow for simple document creation. For example, here's some code that uses the ExcelPackage classes:

using OfficeOpenXml; // namespace for the ExcelPackage assembly … FileInfo newFile = new FileInfo(@"C:\mynewfile.xlsx"); using (ExcelPackage xlPackage = new ExcelPackage(newFile)) { … } ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.Add("My Worksheet"); worksheet.Cell(1, 1).Value = "My Data #1"; worksheet.Cell(1, 2).Value = "My Data #2"; worksheet.Cell(2, 1).Value = "My Data #3"; worksheet.Cell(2, 2).Value = "My Data #4"; worksheet.Cell(3, 1).Value = "My Data #5"; worksheet.Cell(3, 2).Value = "My Data #6";

Note how similar that is to OLE automation code like this for creating Excel documents:

ExcelInstance = new Excel.ApplicationClass(); workbook = ExcelInstance.Workbooks.Add(missing); worksheet = (Excel.Worksheet)workbook.Sheets["Sheet1"]; worksheet.Name = "My Worksheet"; worksheet.Cells[1, 1] = "My Data #1"; worksheet.Cells[1, 2] = "My Data #2"; worksheet.Cells[2, 1] = "My Data #3"; worksheet.Cells[2, 2] = "My Data #4"; worksheet.Cells[3, 1] = "My Data #5"; worksheet.Cells[3, 2] = "My Data #6"; workbook.SaveAs(@"C:\mynewfile.xlsx", ... );

Pretty cool. The ExcelPackage class makes it possible to "think like a spreadsheet" when creating the document, focusing on which cells contain which values, and hiding all the messy details of the underlying XML that is being generated. The two examples are almost identical in syntax. There's one key difference, though, and it's huge.

The latter example requires Excel to be installed on the machine where it's running. This makes the code less scalable, more expensive and complicated to distribute, and difficult to run in a server environment (since server-side automation of the Office clients is not an officially supported scenario). But the first example, using ExcelPackage, only requires the System.IO.Packaging API that's available for free in .NET 3.0. And it will run just fine in a server, in unattended mode as part of a back-end process, on any machine regardless of whether Excel is installed or not.

So if you're looking to get started on an application that create Open XML spreadsheets, take a good look at ExcelPackage. It already handles the basics of values, ranges, and formulas, and since it's open-source you can extend the classes yourself as needed, and share those enhancements with the community as well. Why re-invent the wheel?