Introduction to Excel XML Part 1: creating a simple table

I started this blog as a way of responding to news about the new formats, as well as give some insights into the goals and design decisions behind our XML support in Office. I hadn't really planned on doing any tutorial level posts, but I think it actually might we worthwhile to get more people up to speed on XML in Office. I’m going to start to mix in some more introductory type posts for those not as familiar with XML. Today, I wanted to provide a really simple post for people that aren’t yet familiar with Excel’s XML support. As I start to build up some of these examples, I’ll eventually lead into some previews of what similar files would look like with the new formats.

Excel’s SpreadsheetML has been around for about 4 years now (they actually started development of it about 6 years ago). The SpreadsheetML format is pretty easy to work with, as it was designed to be human readable and editable. For example, the files are “pretty printed”, which we don’t plan to do with the new formats in Office 12. You can use Office XP or 2003 for this example since they both have support for SpreadsheetML in Excel.

Let’s create a really simple table that looks like this (ignore the <TBODY> tags if they show up):

First Name Last Name Phone Number
Brian Jones (425) 123-4567
Chad Rothschiller (425) 123-4567
Shawn Villaron (425) 123-4567
Scott Walker (425) 123-4567
Kevin Boske (425) 123-4567

You can use any text editor to do this. For the sake of this example, let's just use notepad. Create a blank file called "test.xml" and do the following steps.

Step 1: XML declaration and root element

First thing to do in your file is to type the following:

<?xml version="1.0"?>

This just declares that the file is an XML document adhering to the 1.0 version of the XML spec. That should go at the top of all your XML files. Next thing to do is to add the root element for the document. XML files always have one and only one root element that contains the rest of the document. For SpreadsheetML, the root element is <Workbook>. So, after the XML declaration, add that element so that your file now looks like this:

<?xml version="1.0"?>
<Workbook>
</Workbook>

Step 2: Declare the namespace and add a prefix to the root element

Most XML documents out there today have a namespace associated with them. Declaring the namespace of an XML file makes it a lot easier for people parsing your XML to know what type of XML they are dealing with. Even if Office we have a number of different uses of XML. In order to know when you are parsing a Word XML file as opposed to an Excel XML file, one easy way to tell is to look at the namespace. In Office XP when we created the SpreadsheetML schema, we were still using namespaces in the form "urn:schemas-microsoft-com:office". Going forward, we'll use URL namespaces, like we did with WordML in Office 2003 ("https://schemas.microsoft.com/office" for example). By adding the namespace declaration to the spreadsheet, your file should now look like this:

<?xml version="1.0"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet">
</Workbook>

The last thing we'll do for the namespace is to use a prefix, rather than the default. Since the attributes are qualified for the SpreadsheetML schema, we need to do this if we are going to use any attributes. Let's use "ss" as the prefix. We'll add "ss:" in front of all of our elements, and in our namespace declaration, we'll update it to say that the namespace applies to everything with an "ss:" in front of it, instead of just applying to the default XML elements.

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
</ss:Workbook>

Notice that the namespace declaration says xmlns:ss= instead of just xmlns= . That's saying that anything with an "ss:" in front of it applies to the spreadsheet namespace.

Step 3: Adding a worksheet

OK, we now have an empty workbook. The next thing we'll do is to declare the spreadsheet grid within the workbook. As you know, workbooks can have multiple worksheets, but we'll just declare one. In addition, let's declare a table inside the worksheet. The Table is where all the grid data will go:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>

Step 4: Adding the header row

The first row in the table we wanted to generate has "First Name", "Last Name" and "Phone Number" in the three columns. Let's add a <Row> tag as well as three <Cell> tags. The actual content of the cell is contained within a <Data> tag, so let's add that as well:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>

You now have a template for the table that you can open directly in Excel. Go ahead and open it in Excel and take a look. Not too exciting, but it's a start.

Step 5: Adjust the column widths

Notice that the widths of the columns are too narrow for the content. Let's add some XML to our file to specify what width we want the columns represented as.

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>

Now open the file again in Excel. Notice that the column widths are wider and that the text now fits. There is another attribute you can set on the Column element that tells it to use autofit for the widths. This only works for numbers and dates though. Since our cells are strings, we need to explicitly set the width.

Step 6: Adding the rest of the data

OK, now add those additional rows of data. This should be pretty easy. Just select that first "Row" element and copy it. Then paste it 5 more times so you have 6 total rows. Now go through and update the values of the Rows. If you are familiar with XSLT, you'll see at this point how you could easily generate an XSLT that could be applied to a data set to transform it into SpreadsheetML. Just repeat the row tag for each row in your data set and add the values in each Cells Data tag. After applying all the data, your XML should look like this:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Brian</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Jones</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Chad</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Rothschiller</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Shawn</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Villaron</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Scott</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Walker</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Kevin</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Boske</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>

You should now have the full table in Excel.

Step 7: Add some formatting to the first row

For the last step, let's format the first row with bold text, so that it's clear that it's our header row. All we need to do here is generate a style that has bold text, and then reference that style with the first row. So, the first thing we'll do is add the following XML in front of the Worksheet tag:

<ss:Styles>
<ss:Style ss:ID="1">
<ss:Font ss:Bold="1"/>
</ss:Style>
</ss:Styles>

That created a style who's ID is "1" and has bold applied to it. Next, update the first Row element to reference the style ID 1. The Row should now look like this:

<ss:Row ss:StyleID="1">

Your XML should now look like this:

<?xml version="1.0"?>
<ss:Workbook xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
<ss:Styles>
<ss:Style ss:ID="1">
<ss:Font ss:Bold="1"/>
</ss:Style>
</ss:Styles>
<ss:Worksheet ss:Name="Sheet1">
<ss:Table>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Column ss:Width="80"/>
<ss:Row ss:StyleID="1" >
<ss:Cell>
<ss:Data ss:Type="String">First Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Last Name</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Phone Number</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Brian</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Jones</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Chad</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Rothschiller</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Shawn</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Villaron</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Scott</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Walker</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
<ss:Row>
<ss:Cell>
<ss:Data ss:Type="String">Kevin</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">Boske</ss:Data>
</ss:Cell>
<ss:Cell>
<ss:Data ss:Type="String">(425) 123-4567</ss:Data>
</ss:Cell>
</ss:Row>
</ss:Table>
</ss:Worksheet>
</ss:Workbook>

In closing

Like I said, that was a pretty simple example, but I wanted to start posting some information for folks new to Office XML (or even new to XML in general). I'll write up a similar example with Word, and then I'll start to get into some of the more complex details of the schemas.

The new XML formats for Excel will look different than what I just showed you with SpreadsheetML, but there will also be some similarities. It's good to become a little familiar with the existing schemas, and in a few months I'll start posting a lot more about the new schemas.

-Brian