Give us your input on the Microsoft Office Open XML Schemas

Give us your feedback on the design of the file format. While the number one priority with the new formats will always be that the experience for the average end user doesn’t change, I also care a lot about how useful they are to developers. I would love to hear feedback on what is difficult to do with the existing XML schemas for Word 2003. Maybe there are some areas we can look at changing.

Have you looked at the example file I posted the other week, or the developer whitepaper? Let me know what you’d like to see in the new format as well as what you think is problematic in the Office 2003 schemas. There is a public newsgroup ( that’s been around for awhile that is a great place to have discussions around this. Here is a pointer:

If you have any feedback please comment either on this blog or go to the newsgroup. We’ve talked with a lot of customers about the designs of these formats, and we announced them really early so that we could get as much feedback as possible.


Comments (13)

  1. Jeff Lynch says:

    OK here goes! I realize you asked for feedback on Word’s xml format but I just had to post this comment.

    As you can tell from my url and blog, I’m a BizTalk developer with a need to get data from an Excel spreadsheet into an XML format using BizTalk Server 2004 (or 2006). Trying to do this by saving an Excel 2003 file as xml (using your current schema) and then getting this to parse correctly in BTS2004 is nothing less than a nightmare! My only other alternatives are to buy an expensive ($8,000 – how can an adapter cost more than the product?) adapter or teach an end-user to use the XML tools to attach a schema, map the data and export the data as xml.

    I’d be willing to fly up to Redmond (I may be going to MS for the CS2006/BTS2006 TAP) and buy your team and the BTS team dinner in Seattle just to get you folks "together" on this issue. There are quite literally, hundreds of BTS developers that receive data from customers in Excel format and need to integrate this data into other systems without human intervention. Manually saving the Excel file as a .csv and using BizTalk to map from a delimited flat-file to xml is not what I’d call "integration".

    The next time the sun shines in Redmond, walk over to the building housing the BTS2004/2006 team. Show them your new schema and an instance document and ask them to get BizTalk to parse it correctly. This may give you an idea of what some of us are up against when we are asked to get data from Excel into some other system.

    Sorry to be negative but I’ve been trying to get this to work for three months with little luck.

    Jeff Lynch

    A BizTalk Enthusiast

  2. EP says:

    You really confused me with your blog title, was that intentional? Oh, I guess it was intentaional, but did not start with your blog?

    Obviously you know you are following behind Open Office who uses XML file formats. To call MS’ Office Open seems like a deliberate attempt to confuse. Nice move.

    I came here to read about OpenOffice – was mislead.

    No worries. MS lawyers can always out litigate a bunch of people that make software for free.

    Just sharing the love these practices cultivate.

    [I use both, for now]

  3. BrianJones says:

    Hey Jeff, I’ll look into that a bit and try to get some more information. I’ll definitely look into what we can do going forward if there aren’t any great alternatives today.

    EP, sorry about the confusion. I added "Microsoft" in the title as well to make it more clear.

    Not sure what you mean by following behind Open Office for our XML formats. I think that the fact that Open Office uses XML for their format is awesome. The more applications that do it the better.

    We’ve been using XML in Office for about 8 years. We first used in it our HTML formats to store data that couldn’t be represented with the HTML standard (metadata; vector markup; etc.). We shipped that in Office 2000 (development started around 1997). The spreadsheetML format was started in 1999 and shipped with Office XP.

    I think that using XML to represent a file format is one of those areas where you aren’t really being all that clever. It’s a pretty straightforward thing.


  4. Hi Brian

    The biggest change I’d like to see is the option to include the VBProject part in plain text. It should correspond to the "Lock Project for Viewing" option in the VBE – if that’s not ticked, the plain text is included in the file and when opened, the code is loaded from the text part rather than the binary.

  5. John McNamara says:

    One issue with the current SpreadsheetML format is the required use of R1C1 notation in stored formulas.

    Converting formulas from the A1 style notation that the user (generally) expects to R1C1 notation can be tricky even for mundane cases.

    Clearly there are some design considerations driving the use of R1C1 notation in the XML format but I can’t see any benefits that accrue to third party developers using it.

    Perhaps a Workbook attribute to specify the style of formula notation might be an acceptable workaround for people programmatically generating SpreadsheetML files.


  6. Francesca says:

    Hi Brian;

    I’m sorry for using this space not to write a comment to your post but I would like to ask you something about XML file.

    I try to explain you my problem: I have XML files that use different DTDs, so when I Load one of them how could I be sure that I’ reading right file? I though that I could compare DTD of the opening XML file and one I expected to read. To do this I use comparison between file content but it cuold be very long. Is there a method to make this comparision more quickly?

    Thank you for your attention and your kindness.


  7. Matti says:

    It’s understandable to me that you must create a new schema to support all MS Office features. But an OpenDocument-format support would be great too. OpenDocument is already standard and it’s very useful format to store masses of office documents. It would be also great help for developers.

    If it’s going to be implemented, my opinion is that the saving as OpenDocument-format should be in the "save as"-dialog, not in "export" or somewhere else. Just Save As OpenDocument.

  8. Naishal says:

    Hi, i need to develop the Search Feature on the top of Document Management System…

    The need is to search hundreds of documents which can be word, excel, pdf, image anything… if that’s image i will have to go for OCR… But as far as Office formats are concerned how this new xml schemas can help me… Do let me know…

  9. BrianJones says:

    Stephen, I’ve heard this request from a few people now to break out the code in an accessible plain text format. I’ll dig into it a bit more and see if there is something we can do. There is more information in the VB project storage than just the code, so at best it would only be a partial breakout. We’ve looked into doing it before and it just wasn’t worth it given the user experience and perf hit. I’ll look into it more though and maybe even get a proposal together to get your opinion on.


    John, I hear you on the notation issue. Excel’s XML format will actually change more significantly than the Word format. I’ll keep your comments in mind, and probably post an example Excel 12 file sometime next month. Thanks for the comment.


    Francesca, is your question about opening the files in a particular application or are you just asking a more general XML question? Is there are particular XML parser you are using? I can probably answer your question, or you could also try this public newsgroup: microsoft.public.xml


    Matti, you’ve probably seen from previous blog posts that other folks have asked for similar support. This is something that I’m really hoping a partner or some other 3rd party will step up and build. At Microsoft, we have tons of partners that we rely on to build support for scenarios on top of our products. It would most likely need to be more of a publish model, or have some logic to report what features/functionality will be lost in the translation. The great thing is that both MS Office and OpenOffice use XML for their formats, both formats are fully documented, and both are available to use royalty-free, so anyone can come along and build a filter that translates between the two. The key for full interoperability of course will be that both pieces of software support the same set of features, otherwise there will inevitably be some loss. The OpenOffice guys already built support for the Word 2003 XML schema, and hopefully the same will happen for the new formats. When there is a larger customer demand then we’ll often look into building support directly into the product, but in this case we haven’t had a ton of customers ask us for this support.


    Naisha, search should be significantly easier as the formats are just ZIP and XML. You can easily do a plain text search, but you could also take into account more application information to try to leverage things like styles and formatting to imply more semantic information about the text.


  10. Sean Ma says:

    Hi, Brian:

    I have a question about the positioning of setction properties (page size, orientation, etc) in the Word XML files included in Why are they put at end of a file? Don’t you need to read all the xml file to find page size before displaying the first page? I saved an XML file with 1000 pages from Word 2003. It takes about 20 seconds to load it back to Word 2003, while it takes only 1 second to load its doc counterpart.


  11. BrianJones says:

    Hey Sean, the page size and orientation properties are actually section properties, not document. So, at the end of each section you can find the properties that are associated with that section. It doesn’t really affect the performance because we don’t start laying out the document until we’ve read everything in anyway, since it’s possible for other elements (besides just the section properties) to affect the layout.

    You are right that there are some performance differences, although it really is different for each document. Most documents open in about the same amount of time. Certain types of documents will be slower, but we’ll definitely be better than we were with the 2003 XML formats.


  12. Sean Ma says:

    Brian, thank you very much for your quick response.

    Actually we have a viewer that views Word files in DOC and RTF format. Upon request from our customers, we load the first page without parsing the whole file so that the first page shows up immediately regardless file size. This approach reduces memory consumption too when an user does not want to read through the whole file. Now we are considering upgrading our viewer to support Word files in XML format. We want to keep this feature for the XML Word files. Is it possible?


  13. Brian,

    My concern in going forward from Office 11 to 12 is the conversion of custom *.dot files into a schema or DTD. I haven’t seen any discussion on that capability forthcoming. I’m not looking forward having to recreate all of our templates – we finally got them stable from the quirks of list numbering and manipulation of margins to make vertical borders on the text work.