Using LINQ to XML to process Open XML: The SpreadsheetML Class

This post describes the SpreadsheetML class (derives from the OpenXmlDocument class), which presents one approach for using LINQ to XML with Open XML spreadsheets.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCFeatures of this version of the LtxOpenXml namespace:

  • The SpreadsheetML class contains three nested classes, Sheet, Row, and Cell, which represent some convenient abstractions for a SpreadsheetML document. These nested classes are written in the spirit of dynamic types, with the most terse syntax possible. We don't care that all properties are read/write; all of this code is written in a functional style, where no variables are mutated.
  • The SpreadsheetML class contains the Sheets method, which iterates through the sheets in the workbook
  • Each Sheet contains a Rows method, which iterates through the rows in the sheet
  • Each Row contains a Cells method, which iterates through the cells in the row.
  • Each Sheet, Row, and Cell contains its parent, initialized in the constructor. Storing the parent in each object is necessary to get at information in the SpreadsheetML object (for info such as the XML namespace) from within the object initializers for the Sheet, Row, or Cell.
  • This class shows using LINQ to retrieve the shared string if the datatype of the cell is "s". There is a certain amount of indirection within the Open XML specification which is very necessary to construct documents that perform well when they contain millions of cells, with a lot of repeated formulas or strings. LINQ is ideally suited to create the connections within the markup. A future project is to enhance the query to follow the indirection associated with the formatting of cells.
  • These classes are useful as presented, but are not complete. My main point is that I am presenting an approach that could be fleshed out to provide access to all necessary parts of the underlying document. I am not yet sure if this is the best approach; I'm just trying out this approach to see how easy it can be to code against Open XML documents. I have another approach in mind, and will be blogging about it in the future. In any case, the LtxOpenXml namespace is under 500 lines of code, and provides some interesting functionality.

The following example shows a simple use of the SpreadsheetML class.

Console.WriteLine("Printing Cells of Spreadsheet");
Console.WriteLine("=============================");
Console.WriteLine();
using (SpreadsheetML doc = new SpreadsheetML(filename))
{
foreach (var sh1 in doc.Sheets())
{
Console.WriteLine("Sheet");
Console.WriteLine("=====");
Console.WriteLine("Name:{0}", sh1.Name);
Console.WriteLine("Id:{0}", sh1.Id);
Console.WriteLine("SheetId:{0}", sh1.SheetId);
foreach (var row in sh1.Rows())
{
Console.WriteLine(" RowId:{0}", row.RowId);
Console.WriteLine(" Spans:{0}", row.Spans);
foreach (var cell in row.Cells())
{
Console.WriteLine(" Column:{0}", cell.Column);
if (cell.Type != null)
Console.WriteLine(" Type:{0}", cell.Type);
if (cell.Value != null)
Console.WriteLine(" Value:{0}", cell.Value);
if (cell.Formula != null)
Console.WriteLine(" Formula:>{0}<", cell.Formula);
if (cell.SharedString != null)
Console.WriteLine(" SharedString:>{0}<", cell.SharedString);
}
}
Console.WriteLine();
}
}

The LtxOpenXml namespace also contains an updated version of the WordprocessingML class. For consistency, I modified the Paragraph class to hold its parent (An instance of the WordprocessingML class.)

The OpenXmlDocument, WordprocessingML, and SpreadsheetML Classes contains all of the new classes.

This Example contains a small use of each of the classes.

For some links to help topics that explain how to read and write the LINQ queries that are in the OpenXmlDocument, WordprocessingML, and SpreadsheetML classes, see Writing LINQ to XML Queries using the OpenXmlDocument class.