Working with OpenXML documents

The default format for saving Excel, Word and PowerPoint documents has luckily changed since Office 2007 to the Open XML formats.

There is good documentation available for these formats on:

MSDN: Introduction to the Office 2007 file formats:

https://msdn.microsoft.com/en-us/library/aa338205.aspx

MSDN – sample code: Editing Excel cell values from .NET code:

https://msdn.microsoft.com/en-us/library/bb332058.aspx

ECMA: ECMA 376 Standard:

https://www.ecma-international.org/publications/standards/Ecma-376.htm

The following tool makes it very easy to view and modify Open XML documents.

The basic syntax is:

OpenDoc.exe <mode> <Input file> <Part> <XPath> [Value] [Attribute]

What the tools enables one to do is to select specific nodes from a part of an Open XML document and display its value and attribute, modify the value or attributes and to delete a node.

The easiest way to work with it is the batch mode:

OpenDoc.exe –b <Path to batch file>

A batch file is any text file that contains processing commands one per line.

Here is a sample batch file:

-sn Book1.xlsx /xl/worksheets/sheet1.xml //n:sheetData/n:row/n:c[@r='D1']/n:v

-m Book1.xlsx /xl/worksheets/sheet1.xml //n:sheetData/n:row/n:c[@r='D1']/n:v 20

-sn Book1.xlsx /xl/worksheets/sheet1.xml //n:sheetData/n:row/n:c[@r='D1']/n:v

-r Book1.xlsx /xl/worksheets/sheet1.xml //n:sheetData/n:row/n:c[@r='C1']

-sa Book1.xlsx /xl/workbook.xml //n:sheets/n:sheet[@name='Sheet2']

-m Book1.xlsx /xl/workbook.xml //n:sheets/n:sheet[@name='Sheet2'] MySheet name

 

What the six sample commands do is as follows:

- Display the XML of the node corresponding to address D1 from the XML part of the first worksheet from Book1.xlsx

- Modifies the value of this node to 20 (the cell D1 will show 20 now in Excel)

- Displays the same node again with the updated value

- Remove the XML node for cell C1

- Shows the attributes of sheet entry with the attribute ‘name’ equals ‘Sheet2’

- Modifies the value of attribute ‘name’ of the worksheet entry for Sheet2 with the value ‘MySheet’. If we open the workbook in Excel now, the name of the second sheet will be the one we’ve just assigned.

Command reference:

Command

Meaning

-sn; -showNode

Displays the XML content of the selected node

-sa; -showAtt

Displays the attributes of the selected node

-m; -modify

Modifies the Inner XML or the attribute value of the selection

-r; -remove

Removes the selected XML node

 

The XPath argument must evaluate to a legal XPath expression.

Example: //n:sheets/n:sheet[@name='Sheet2']

The reason why “n:” is used for qualifying every node name is linked to the XML namespaces that all the internal XML files inside an Open XML document are declared in. The namespace will be automatically resolved by the tool.

The download from the attachment contains the EXE, the sample XLSX and the sample batch file.

In order to run the program, the .NET Framework 3.5 is required.

OpenDoc.zip