Code snippets for working with the Open XML formats


This is my first time on an airplane with wifi, so I’m pleased to bring this news to you from somewhere over the pacific.

I’ve been promising for the past year that one of the big things we’re going to do this time around that we didn’t do much of with the 2003 XML formats was to provide a whole bunch of examples. The openxmldeveloper.org site is going to be the best place for people to share their experiences and code, but it’s also really important that we at Microsoft give examples of how to do various things.

With the 2003 formats, we had every element and attribute documented, but we didn’t do a great job of showing how to actually use the formats. This time around, we want to provide examples that will provide good prescriptive guides on how to do various things with the files. We came up with a huge list of what we thought people would like to see, and it was pretty hard to narrow it down.

We now have the first set of examples, and they all work against the Beta 2 version of the file formats (they will also be updated to match the final versions once they are finished in Ecma). You can go grab them up here: http://www.microsoft.com/downloads/details.aspx?FamilyID=8d46c01f-e3f6-4069-869d-90b8b096b556&displaylang=en

The examples leverage the WinFX system.IO.packaging interface, but they could also be mapped to function with other tools (just like the java examples up on openxmldeveloper). You’ll probably notice that the examples are some of the more basic ones that we could think of, but it made sense to use these as the starting point. We will most likely start building some more complex ones as well that leverage one or more of the initial examples as building blocks.

There are 40 examples overall, and I’d love to hear what you think. Also let me know if there are other things you’d like to see us add to the list. Kevin Boske who is on the programmability team for Office was tasked with pulling these together, and I’m really appreciative of the work that he and Ken Getz did on these. Kevin already blogged on this earlier today, and there is also mention of them up on OpenXMLDeveloper.org.

-Brian

Comments (8)

  1. slueppken says:

    Hi Brian!

    I’ve got a question concerning the Beta 2 file formats and the final versions: Will Office 2007 RTM be able to read files saved with Beta 2? :)

    Thanks!

    Sven

  2. BrianJones says:

    Hi Sven,

    Yes, Office 2007 RTM will have the ability to read files from Beta 2.

    You should note thoug that RTM will only save in the final version of the format. That means that while RTM will be able to ready Beta 2 files, Beta 2 will not be able to read RTM files.

    -Brian

  3. John Robins says:

    Brian,

    I believe the new file format is a very good idea.

    However, it seems more like an XML dump of the legacy MS format rather than a clean and well documented format.

    Going through ECMA for standardization seems like a good idea but the documentation is very poor. For example, see my post on indexed colors.

    http://openxmldeveloper.org/forums/thread/298.aspx

    In practice, this means it’s not clear how to read a background color in excel. If something as basic as colors is not documented, how can you expect developers to embrace this format?

    Dates:

    http://openxmldeveloper.org/forums/286/ShowThread.aspx#286

    Dates are all wrong because of … Lotus 123! Are you serious? Did you know that Lotus 123 did it this way to be compatible with IBM mainframes. And of course, IBM mainframes needed to be compatible with von Neuman’s first computer….

    Personally, I am not convinced that dates should be stored as an integer. Besides, why bother with 1900 vs 1904 dates? Why not have one clean date format like, say, year-month-day?

    See also this post:

    http://openxmldeveloper.org/forums/thread/219.aspx

    "Office encrypts the entire ZIP file and stores it as a stream in an iStorage".

    an "iStorage"? I thought this was open XML? not legacy COM crap.

    Is this all we are supposed to know in order open on encrypted office file?

    Finally, openxmldeveloper is a big flop.

    – 0 (Zero) workspace

    – The largest forum (wordprocessingML) has … 19 threads!!

    wow. 19!

    Is Microsoft really interested in open formats?

    Or is it simply a way to convince uninformed bureaucrats that it’s not a good idea to switch to open office?

    If Microsoft is really serious about open formats, it really needs to provide precise and accurate information.

    And answers posts on its own site!

  4. Francis says:

    I understand the need for compatibility, but might not it be better strip away some of the barnacles (e.g. date format)?

    It seems like there are two options re the date format:

    1. Keep compatibility with Lotus/IBM/etc. by preserving the old format in OpenXML

    2. Implement a more logical format and have Excel convert cells/formulas to the new format when saving in OpenXML

    The draft format could be slimmed down if, instead of creating an OpenXML equivalent of Office 2003 features, the latter were replaced during conversion with OpenXML features.

    This could entail more work in the short-term (programming the conversion routines), but it would likely save effort over the long-term–as well as for third parties, in the short-term–by simplifying the format and thus codebase that need be maintained.

  5. Francis says:

    John’s comments remind me of the discussion about legacy border styles and Brian’s response:

    "There are so many features and we need to represent them all in XML, and document it all… We could have done as you suggest and just use resource files for them, but that would have actually been more work in the applications."

    The question is why do they all need to be represented in XML? 100% compatibility does not mean 100% parity, feature-by-feature.

    What could be done instead is NOT specify any such border styles in the ECMA draft and instead, as suggested, use resource files.

    HOWEVER, contrary to what Brian said, these resource files would not need to be made public in the application. Office 2007 could transparently substitute the resource files for the legacy borders when saving an old DOC file in XML. This would achieve the goal of 100% compatibility DOC->XML. Furthermore, it would unclutter the specification. (Third parties would not have to program for all of these border styles but simply render the resources included in the OpenXML document.)

  6. Francis says:

    Just to clarify–I meant Office would save the resource files referenced INSIDE the XML file. Thus all third-party tools would be able to access them without any guidance from the spec.

  7. y says:

    By coincidence, Joel Spolsky just posted some reminiscences of his days as an Excel PM, including a description of the Excel date issue for the year 1900.

    <http://joelonsoftware.com/items/2006/06/16.html&gt;