Simple SpreadsheetML file Part 1 of 3

I posted a bunch of "Intro to SpreadsheetML" posts about a year or so ago, but those were all based on the Office XP spreadsheetML format. I think an updated series based on the Open XML standard is long overdue. I'll start off just building a simple table, and in future posts show more about formatting, formulas, and maybe even some charts.

Today, we'll start by creating a simple table. Then we'll add a little bit of number formatting and some formulas. In the end, we'll have the following table (where the tax & total columns are automatically calculated based on the Sub Total column):

Sub Total

Tax

Total

$ 14.95

$ 1.20

$ 16.15

$ 19.95

$ 1.60

$ 21.55

$ 4.95

$ 0.40

$ 5.35

We'll take this in 3 separate blog posts:

  • Part 1 - Create the simple table without formatting or calculations
  • Part 2 - Add functions to calculate "Tax" and "Total"
  • Part 3 - Add formatting so the data shows up as currency

Part 1 - Simple Table

Since we won't do the formatting or formulas initially, our table will look like this:

Sub Total

Tax

Total

14.95

   

19.95

   

4.95

   

As I discussed in my "simple wordprocessingML document" post, the Office Open XML format is comprised of a number of XML files within a ZIP package. The files follow a simple set of conventions called the open packaging conventions (described in Part 2 of the standard). You need to declare the content types of the parts, as well as tell the consuming application where it should start (via the package relationship).

Unlike the WordprocessingML document we created though, a SpreadsheetML file has a bit more structure to it (the same is true for presentations). A SpreadsheetML file is actually a workbook that can contain multiple worksheets. so even your most simple workbooks will have at least 5 files within the ZIP package. So for this example, let's start by creating a folder somewhere and in that folder create the following files:

  • workbook.xml
  • worksheet.xml
  • [Content_Types].xml
  • _rels/.rels
  • _rels/workbook.xml.rels

workbook.xml

The workbook is essentially the container for the various worksheets. The workbook is where you can reference the styles part, shared string tables, and any other pieces of information that apply to the entire Spreadsheet file. In this example, since we're just creating a super basic spreadsheet, the workbook will be very simple:

<?

xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="https://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="https://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <sheets>
<sheet name="Brian" sheetId="1" r:id="rId1"/>
</sheets>
</workbook>

The only interesting thing we did here was to create the sheet tag, which then references out worksheet via the r:id attribute. Remember that almost every time you reference another part or even something outside of the file like a hyperlink or a linked image, you will use a relationship. The next thing we need to do is actually create that relationship in the workbook.xml part's relationship file.

_rels/workbook.xml.rels (part 1)

This is pretty basic. We just need to create a relationship that has an id of rId1 so that it will match the reference from the workbook.xml part:

<?

xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="https://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="https://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheet.xml"/>
</Relationships>

Notice that a relationship has three main attributes. It has an Id attribute who's use will be more obvious in a bit. The Target attribute tells you where to go, and the path is relative to the parent directory of the "_rels" folder that relationship file is in (in this case that's the root directory). The Type attribute describes what kind of relationship it is (ie what kind of stuff is it pointing at).

In this part we have one relationship who's type is "worksheet", and the target points to our worksheet.xml part. Now we need to actually put some content in the worksheet.xml part

worksheet.xml

The worksheet.xml part is going to be pretty simple. The first row in the sheet will have the column titles ("Sub Total", "Tax", and "Total"). The next 3 rows will only have data in the first column as we won't create the calculation functions until later in this example.

The worksheet.xml part should look something like this:

<?

xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="https://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="https://schemas.openxmlformats.org/officeDocument/2006/relationships">
<sheetData>
<row>
<c t="inlineStr">
<is>
<t>Sub Total</t>
</is>
</c>
<c t="inlineStr">
<is>
<t>Tax</t>
</is>
</c>
<c t="inlineStr">
<is>
<t>Total</t>
</is>
</c>
</row>
<row>
<c>
<v>14.95</v>
</c>
    </row>
<row>
<c>
<v>19.95</v>
</c>
</row>
    <row>
<c>
<v>4.95</v>
</c>
</row>
</sheetData>
</worksheet>

If you've played around with the SpreadsheetML that Excel outputs, you'll notice that I've handled strings a bit differently. For faster save and load times, Excel actually uses the shared string table that is an optional feature of the Office Open XML formats. It allows you to write a string just once, and then reference that string from within the grid. So, instead of saying "Sub Total" in that first cell, Excel would have an id that references the entry in the string table for "Sub Total". This is just an optional feature though, and for simplicities sake I've just put my strings inline (using the <is> tag). There isn't really any perf gain if the string is only used once.

Also, you'll notice that unlike a typical table format (like HTML, CALS, etc.) the XML above is representing a spreadsheet. It's a subtle difference when working with simple examples like this, but becomes more obvious as you move into more complex spreadsheets. One noticeable difference right away though is that we don't write any elements down for the empty cells B2:C4. If there isn't any data in a cell, then you just don't write anything. This is a bit of a different model from table formats that are more presentation based.

_rels/.rels

How does a consuming application know where it should start when opening an OpenXML file? The first place you always look is the package relationships file. The package relationship file will always be located in the "_rels" directory, and it's always called ".rels". We need to create an XML file that tells the consumer that "workbook.xml" is the first place you should go, and that this type of document is an Office Open XML document:

<?

xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="https://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1" Type="https://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="workbook.xml"/>
</Relationships>

[Content_Types].xml

OK, so we've now created the main workbook.xml part and the worksheet.xml part, as well as created a relationship between the two.

Every Office Open XML file must declare the content types used in the ZIP package. That is done with the [Content_Types].xml file. We currently have two parts in this document that we need to declare content types for. The first is the document.xml part; the second is the _rels/.rels part. So, the content types file should look like this:

<?

xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="https://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Override PartName="/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/>
<Override PartName="/worksheet.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>
</Types>

Here we are saying that anything ending with the .rels extension is of type Package Relationship, and we also declare that the part workbook.xml is of type workbook and worksheet.xml is of type worksheet.

Create Version 1 of our simple SpreadsheetML file

OK, we should now have five files. Three of the files are in the root directory (workbook.xml; worksheet.xml & [Content_Types].xml); and in the "_rels" directory we have the ".rels" file and the "workbook.xml.rels" file. Select the two files and the "_rels" directory and ZIP them up. Make sure that when you zip them up, the two files and the _rels directory are all at the root level.

Open this file in Excel, and you now have a simple file.

*Beta 2 TR Note*

Note that the namespaces are different between B2TR of Excel, and the final version of the Ecma standard. In this post, I have shown what the final version of the Ecma standard would look like. The RTM build of Office will use this same format. If you are on B2TR though, then you'll need to tweak the following namespace in the XML files: https://schemas.openxmlformats.org/spreadsheetml/2006/main

Instead of using that namespace, you'll need to use this namespace https://schemas.openxmlformats.org/spreadsheetml/2006 /7/main to get it working in B2TR. The two parts you'll need to update with this namespace are wordsheet.xml and workbook.xml. Everything else should work fine

Well, that was the first piece. In the next post, we'll add functions to the spreadsheet.

-Brian