New default XML formats in the next version of Office


I’m Brian Jones, a program manager on the Word team. I’ve been at Microsoft for about 6 years, and have been working on XML support in Word and across Office for a good percentage of that time. I thought I’d set up this blog to talk with people about what we’re doing in the next version of Office around XML. When we first started talking about Office 2003 and the features we were going to provide around XML, there were a lot of misinterpretations. It was frustrating not having an easy way to answer questions, provide insight, and clear up any misunderstanding. I didn’t want to make the same mistake again, so I told everyone that I wanted to start blogging as soon as we announced the new “Microsoft Office Open XML Formats” (still getting used to the official name). The PR folks said they thought it would be ok, and they even decided to post some links to this site from the different marketing materials being released which is pretty cool.


I’ve been waiting a long time for this day, and it’s awesome that I’m able to talk about this so early in the product cycle. I made a post last week talking about Office 2003 XML, but that was just more of a test to see how this whole blog thing works. The real reason for setting up this blog was to talk about the new default XML formats in the next version of Office (although I’m sure I’ll spend a good amount of time talking about 2003 as well).


I’m hoping that people will have tons of comments and questions because I’m eager to spend time discussing this topic (I already do with the people I work with so why not branch out a bit). I’d like to find out what kinds of questions people have, and what kind of additional information or tools you’d like to see. The whole point of these new formats is for them to be open to anyone to work with, so I want to make sure we make it as easy as possible.


If you haven’t already read the press release, it’s probably worthwhile since it gives a good overview of everything that’s happening. It is a press release though, so you’ll have to deal with it coming more from a marketing angle. You should be able to find it up on the presspass site: http://www.microsoft.com/presspass


I didn’t want to make this first post too long, but I do want to go into some of the things I think are the most important to understand about these new formats. I’ll definitely spend more time in future posts digging deeper on these different topics, as well as going into the goals behind the formats.


Open XML Formats Overview


To summarize really quickly what’s going on, there will be new XML formats for Word, Excel, and PowerPoint in the next version of Office, and they will be the default for each. Without getting too technical, here are some basic points I think are important:



  1. Open Format: These formats use XML and ZIP, and they will be fully documented. Anyone will be able to get the full specs on the formats and there will be a royalty free license for anyone that wants to work with the files.
  2. Compressed: Files saved in these new XML formats are less than 50% the size of the equivalent file saved in the binary formats. This is because we take all of the XML parts that make up any given file, and then we ZIP them. We chose ZIP because it’s already widely in use today and we wanted these files to be easy to work with. (ZIP is a great container format. Of course I’m not the only one who thinks so… a number of other applications also use ZIP for their files too.)
  3. Robust: Between the usage of XML, ZIP, and good documentation the files get a lot more robust. By compartmentalizing our files into multiple parts within the ZIP, it becomes a lot less likely that an entire file will be corrupted (instead of just individual parts). The files are also a lot easier to work with, so it’s less likely that people working on the files outside of Office will cause corruptions.
  4. Backward compatible: There will be updates to Office 2000, XP, and 2003 that will allow those versions to read and write this new format. You don’t have to use the new version of Office to take advantage of these formats. (I think this is really cool. I was a big proponent of doing this work)
  5. Binary Format support: You can still use the current binary formats with the new version of Office. In fact, people can easily change to use the binary formats as the default if that’s what they’d rather do.
  6. New Extensions: The new formats will use new extensions (.docx, .pptx, .xlsx) so you can tell what format the files you are dealing with are, but to the average end user they’ll still just behave like any other Office file. Double click & it opens in the right application.

I’ll definitely go into a lot more detail on these different points in future posts. Just to summarize though, I’m really happy with these new formats so far. Microsoft will build a lot of functionality around these formats for years to come, but I also hope other people outside of Microsoft will take advantage of them, since anyone that wants to can. You can look inside the files, make modifications, generate new files, add content, remove content, or any other number of things that people would want to do with an Office file.


If you want some more information in a more official form, there are two whitepapers available. Here’s a brief overview of each one:


Whitepapers


The Microsoft Office Open XML Formats: New File Formats for “Office 12”


http://download.microsoft.com/download/c/2/9/c2935f83-1a10-4e4a-a137-c1db829637f5/Office12NewFileFormatsWP.doc


This first whitepaper is a general overview of the file format, and is targeted at multiple audiences. It starts off with an introduction about what’s going on and also briefly touches on the history of the current binary formats and how we got to where we are today.


The Microsoft Office Open XML Formats: Preview for Developers


http://download.microsoft.com/download/c/2/9/c2935f83-1a10-4e4a-a137-c1db829637f5/Office12FileFormatDevPreviewWP.doc


This paper talks more about the architecture of the formats and is targeted at developers. This paper has a similar introduction to the first (but from a slightly different angle). The last 7 or so pages of the paper go into solutions and what people can do with these files. It’s a great way to start thinking about the possibilities, and what types of things you can probably expect to see built on top of the format.


 


OK, that’s enough for now. Sorry this was such a long post, but I didn’t have time to make it shorter (I think that was Twain or Pascal?). I’m going to get some sleep, and then see what things people are curious to know more about. Talk to you all tomorrow.


-Brian


    Comments (140)

    1. PatriotB says:

      Kind of ironic… just as Avalon comes along and makes using structured storage (the basis of Office file formats) the standard, Office moves away from it…

    2. Brian, awesome stuff!

      Check out the video of you over on Channel 9:

      http://channel9.msdn.com/ShowPost.aspx?PostID=73329

    3. This is definately big news. Personally I am an Office 2004 user, but I will be very glad when I can have document portability between word processors.

    4. Brian, This is great news!

      Heck, I’m just happy that PowerPoint is going to have an XML file format, let alone that these formats will be backward compatible! These are great new additions for us Office devs. It’s like Christmas :^)

    5. beza1e1 says:

      This is cool for interoperability, but we need Open Document Support as well. Will we be able to save this format or is docx compatible?

    6. Open Office already stores files in XML format rather than a proprietary format like Microsoft. I’d be interested to know how this new XML solution Microsoft is adopting compares (or betters) the one available in Open Office – is Microsoft just playing catch up (at least in this area)?

    7. Erwin Tenhumberg says:

      See also:<br>

      <a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office">OASIS OpenDocument XML Format</a><br>

      <a href="http://europa.eu.int/idabc/en/document/3439">Recommendations by the EU</a><br>

    8. Nektar says:

      Word, Excel, Powerpoint. Why not Publisher? Onenote?

      I also know that with the previous open XML formats in Office 2003 there were some issues with patents and so no other program had implemented reading these formats. Given that the XML formats in 2003 had actually failed to be adopted by other application, how are you addresssing this issue in the new Office.

    9. Rosyna says:

      When y’all say "open" do you mean that it won’t encode COM objects or any other kind of object in the XML as a binary format that is not documented?

    10. In the 2003 XML documents, while the core schemas were more-or-less documented, auxilary ones such as VML did not fare nearly as well. Hopefully the documentation for these new formats will address all of the constituent schemas?

      Also, I would like to know exactly how deeply the XML-ization is being applied. For example, will MSGraph and Equation Editor objects also be represented as XML, or continue to exist as undocumented binary blobs?

      Thanks,

      Chris

    11. Microsoft has announced that the next version of Microsoft Office will use, BY DEFAULT, new OPEN XML…

    12. Im Blog von Brian Jones (Program Manager im Word-Team bei Microsoft) gibt es eine sehr interessante

    13. Skeptic says:

      "Robust: Between the usage of XML, ZIP, and good documentation the files get a lot more robust. By compartmentalizing our files into multiple parts within the ZIP, it becomes a lot less likely that an entire file will be corrupted (instead of just individual parts). The files are also a lot easier to work with, so it’s less likely that people working on the files outside of Office will cause corruptions."

      Ho-ho-ho. What a piece of marketing crap. Corrupting a ZIP archive is much easier and more likely to happen than corrupting a single large XML file. Just a single bit corruption of an archive usually results in large portions of it being unreadable.

    14. settantta says:

      MS is a bit slow, aren’t they! OpenOffice.org/StarOffice have been using precisely this file format for the last 5 years (that’s right folks – they had a zipped XML file format in 2000!)

      About time MS caught up with international standards though. Remains to be seen how they’ll go about crippling the interoperability of these formats – if it is possible, they’ll surely find a way, since this type of competition seems to be something they are unable to handle without cheating.

    15. Politeknik says:

      XML is open. XSLT is well-defined. ZIP is standard.

      I hope Microsoft isn’t stupid enough to patent the file formats.

    16. HG says:

      Hi there

      Great news. For some time ago I read up on the 2003 schemas and I found them great..for simple documents… There were problems with embedded objects, images, VBA code…Will these issues be fixed in the new version.

      I am thinking of embedding images (and other content), embedding VBA code (event handlers), etc.

      Main goal is to write a server-side "report" using the desired schema and stream it back to the desired application.

      Regards

      Henrik

    17. Anil says:

      "Sorry this was such a long post, but I didn’t have time to make it shorter (I think that was Twain or Pascal?)"

      That was Pascal.

    18. Big news from the Office team yesterday! It looks as if XML is going across the board in Office 12. So much so that they are changing the file extensions to .docx, .pptx, .xlsx. WOW! This is major and very…

    19. I think we all have been expecting this for a while since Microsoft introduced the XML format in Office 2003. Now they’re going full on and going to use XML (with zip compression) as their default Office format. It’s certainly…

    20. Quentin Pouplard says:

      Hi,

      Sounds like good news, but how will it be supported in previous Office version? I’d really like to be able to generate excel or word document from my favourite webapp… but I need them to be readable by lot of people… (even inside an organization, we’re sometimes stuck with 97 version!). Any comment on this? Will we be forced to configure our new Office 200? to serialize doc as old format, to be sure that the doc are still readable?

      Any comment on this?

    21. Dev Notes says:

      Brian Jones, Program Manager for MS Word team announced the new file formats in the next Office System…

    22. Good news, and (trusting that the file formats are as easy and open as you indicate – haven’t seen them yet) I’m really looking forward to it!

      When are you going to apply this to Visio. Next week, right?

    23. Also, I haven’t been able to find a complete sample. I know the schemas aren’t settled yet and so will be changing (perhaps substantially), but can we get one complete example to look at? I want to see just how much of an improvement over WordML you’ve made.

    24. BrianJones says:

      Wow, thanks for the comments. I’ll try to get to each question as soon as I can.

      Cori – Currently the plan is just for Word, Excel, and PPT to take this approach with the ZIP container. As you know though, Visio has had an XML format for a couple releases now. It’s a single XML file.

      I’ll see what I can do about getting an example file up. I’ll definitely show more at tech-ed next week if you’re going to that. I was wondering if you could give some more details on what kinds of improvements you’d like to see made on the existing WordML implementation. Are there any specifics you could give on what you find difficult?

    25. tecosystems says:

      First, the good news. As many of you have seen by now, at 12:01 AM ET last night, Microsoft announced rather extensive – and pretty much open – support for XML based formats for its next gen Office iteration, due…

    26. James Jones says:

      Is there anything about the licensing terms that would preclude GPL software from making full use of documents in the new format?

    27. You mention compatibility with Office 2000, XP, 2003… What about the Windows Mobile versions? Currently, PocketWord and PocketExcel can read .doc and .xls files (though I think they are translated by ActiveSync). Any idea how these new formats will work with the Pocket* applications?

    28. EnronHaliburton2004 says:

      Bah! Most of the posts here are about how this new XML format will open the MS Office formats.

      This is wrong.

      XML is composed of two pieces: XML, which is open; and an XML Schema, which will be closed and propietary. An XML document without an XML Schema is not usable by other applications.

      So don’t go around pretending that Microsoft’s "XML" will let people open Word documents with other applications. It won’t. They are still maintaining their closed formats and will continue to lock-in the customers.

    29. BrianJones says:

      EnronHaliburton2004 – You’re right that the schemas are very important for being able to interpret the XML. You can get the schemas from the Office 2003 XML up here: http://www.microsoft.com/downloads/details.aspx?FamilyId=FE118952-3547-420A-A412-00A2662442D9&displaylang=en

      The schemas are fully documented so you can read through and find out for yourself how to interpret the XML. In addition there is a royalty-free license that is available which means that you don’t have to pay anything to Microsoft for it’s use.

      We’ll of course be doing the same thing with these new formats, although they are still under developement so we don’t have the schemas ready yet. I really want to go a lot further than we did in 2003 and provide some really good best practices documentation as well as tons of examples.

    30. BrianJones says:

      Skeptic – In response to the post a bit further up, I wanted to be a little more clear on why I said the files are more robust.

      While I’ll agree that ZIP isn’t as robust as text, it’s still extremely robust relatively speaking (compared to compound doc for instance). A single bit corruption will usually result in the one part that was corrupted being unreadable. So, in order to make the files robust, we break them up out into multiple pieces (files) within the ZIP. For example, with a powerpoint file, each slide is a separate XML file inside the ZIP. This is done both for robustness as well as making it easier for people (developers) to work with the files.

      There are a number of different corruptions that can occurred. Some are from user error, while others have to do more with faulty harddrives, transmission errors, etc (essentially bit rot). Let’s just talk about the bit rot example for now. While there is a central directory at the end of the entire ZIP package, that central directory isn’t required to open the ZIP. The files inside the ZIP are all written out serially, and there is a header before each file. So, even without the central directory, we could still rebuild it just by scanning the file looking for each header. In addition, each file is compressed separately. That means that if one file gets corrupted, you can still open all the other files just fine. I can go into much more detail if you guys are interested in this topic.

    31. Mike Jones says:

      You publish the spec for these formats and they are based on XML. That’s good. You also promise to give out royalty-free licenses. That’s good, too.

      But that doesn’t make those formats "open". The very fact that people need to obtain royalty-free licenses from Microsoft demonstrates that the formats are still proprietary (i.e., owned by Microsoft). If Microsoft didn’t own them, people didn’t have to obtain a license from Microsoft in order to implement them.

      So, when you say that the formats are "open", you are misleading people. The formats are not "open", they are proprietary (albeit documented).

    32. BrianJones says:

      Joshua – I’ve talked with the team that makes the PocketWord and PocketExcel applications and they are looking at these new formats. I can’t really comment for them though on what support they are planning to provide (sorry), but I know they were excited about the fact that the formats were XML.

    33. Anonymous says:

      "The very fact that people need to obtain royalty-free licenses from Microsoft demonstrates that the formats are still proprietary (i.e., owned by Microsoft). If Microsoft didn’t own them, people didn’t have to obtain a license from Microsoft in order to implement them."

      Actually, most software and standards that are labelled "open" have royalty-free licenses. Look at the legal verbiage that standards bodies use and you will see that there is implicit ownership and a license agreement for all of their deliverables.

      Partially because of the loud voice of the open source community, Microsoft has now chosen to open up a portion of its intellectual property. Instead of complaining that it’s not enough, we should encourage the folks in Redmond.

      Who knows… with enough positive feedback, they might do this more often 🙂

    34. SMC says:

      Like many of the other posters here (who have, as I leave this post, not been answered at all), I’d be interested to know where and why the presumably "special" Microsoft approach to this differs from the existing OASIS document standards (and for that matter, specifically how and why it differs from the formats used by StarOffice/OpenOffice, particularly the 2.0 beta versions out right now.).

      Any comments on this? Is this move by MS genuinely intended to allow interoperation or is this (as many cynics [some would say ‘realists’] feel) simply an attempt to "head off" the already published-and-in-use standard of the XML-in-a-zip in favor of something that Microsoft can control?

    35. Robert Jacobson says:

      A humble feature request:

      If I read the whitepaper correctly, another application could (of course) edit the underlying XML and thereby make changes to the contents of the document, which would then be reflected when the file is open in Word, Excel, etc.

      It would be a Really Good Thing if an add-in running within Word or Excel could also dynamically access the XML in an open Word or Excel document, edit the XML, and post the changes back to the underlying document.

      Assume, for example, that you develop an add-in that makes particular changes to the text or formatting of a Word document. Currently, you need to use the bolloxed-up Word Object Model (which is really painful from within C#, at least, and sometimes not very performant). E.g., you’d use the Find object to search for certain text and then use the various parameters of the Selection or Range objects to change the text or formatting.

      It would be very nice to have, as an alternative, the ability simply to run the Office XML file through an XSLT stylsheet. Then you wouldn’t have to use the Word object model at all to accomplish these changes.

      If I understand correctly, as currently intended an add-in could save and close the file, make the changes using XSLT, and then reopen the modified file. But this is kludgy and would look strange to the user (why did the window just close and reopen?)

      It would be preferable if an add-in could dynamically edit the underlying XML (either of the whole document or just of a single paragraph, for example) while the document was open, with those changes automagically being reflected in the open document.

    36. BrianJones says:

      Robert – That’s actually a great use case. If you play around with the Word 2003 OM, you’ll see you can already do something similar today. There is an xml property off the range object. You can selected the entire file as a range (or just do document.xml I believe) and request the XML for that file. You can then muck with it all you want, and when you are done, use the insertXML method (you’d probably want to select the whole file again to overwrite what’s already there).

      To everyone else I haven’t replied to yet, sorry. I’m keeping a list of the questions and will try my best to address them. It’s been a really busy day.

    37. Robert Jacobson says:

      Brian,

      The Range.xml (or Document.xml) feature sounds great. I haven’t done much with XML for Office 2003 because my understanding is that it isn’t 100% compatible for roundtripping purposes (preserving all formatting, etc.).

      Do you know whether the Range.XML feature will be updated with Office 12, so the XML it emits and accepts will be 100% compatible with the new file format? It would be unfortunate if it continued to use the current format and schema.

    38. Wesley Parish says:

      First time I read this I started laughing. After all, it is well past April Fools’ Day. And Microsoft can’t claim this at least to be any sort of innovation – I’ve been using OpenOffice.org for at least four years now, and what has been described and defined as the docx, etc, file formats, read like what I already know about the sxw, etc, file formats, albeit with a different name.

      It is that that I want to ask about – OpenOffice.org’s file formats are quite robust, and are rapidly turning into the standard. Will Microsoft Office follow that already market-driven standard? Now that OpenOffice.org has been accepted in such large markets as Brazil and China?

      Because if Microsoft fails to follow this market standard and diverges by even a small amount in any undisclosed manner, that cannot be accounted for by technical reasons any sane technician could accept, I fail to see how my current employer, a non-profit community organization, can afford to use any future Microsoft Office.

      Thanks

    39. Another question–now that the document summary information is presumably stored as an XML stream inside the zip, instead of as a Structured Storage stream, won’t this effectively hide the info from current versions of Windows? -Chris

    40. Annu says:

      How does this affect current solutions based on the Office 2003 WordML format?

    41. Sean Clarke says:

      Mixed response – in one way a good thing, but then MS sour the milk by not reusing or even extending an existing standard, I suppose if they did then they couldn’t claim "innovation".

      It is a real shame thet the OASIS file format was not employed, if it was deficient, then I am sure MS could of worked with the community and added extensions etc. but MS has never really done that has it?

      Please don’t bander the term "open" – using it in this context is misleading, "published" would be a better description. MS owning the standard allows them to change it and break 3rd party compatability on a whim.

      If you want to use "open" and really be seen as "open" by any user with a bit of savvy then hand ownership of the schema to either the community or get ISO recognition.

      All in all, a missed opportunity.

    42. Mike Jones says:

      "Actually, most software and standards that are labelled "open" have royalty-free licenses. Look at the legal verbiage that standards bodies use and you will see that there is implicit ownership and a license agreement for all of their deliverables."

      Yes, you can indeed have open standards that include patented technology under royalty-free licenses. Such licenses are not just royalty-free, however, they include additional guarantees to users (e.g., they may be transferable or they may be available through an independent standards body). Microsoft’s licenses do not have such guarantees.

      "Partially because of the loud voice of the open source community, Microsoft has now chosen to open up a portion of its intellectual property."

      Come on, be serious. Microsoft didn’t document these formats because of some new-found fondness for open source or because someone asked nicely. Microsoft reacted to strong pressure from customers and did the absolutely minimal necessary thing. They are trying to keep the formats proprietary while giving the appearance of being open.

      "Instead of complaining that it’s not enough, we should encourage the folks in Redmond."

      Do you seriously want us to believe that Gates and Ballmer, tough take-no-prisoner executives with a history of illegal business practices, respond to friendly conversation or reasoning from open source advocates? Microsoft management has stated clearly that they intend to kill open source through patents, and that is exactly what they are attempting to do with this.

      The only thing that "encourages" Microsoft to do anything is the threat of losing market share. Microsoft will have to be dragged kicking and screaming towards open systems, but it will be dragged. Patent shenanigans and PR bullshit like "open XML formats" will not make a difference; in the long run, they’ll just backfire, because they’ll invigorate open source developers and create more competition for Microsoft.

    43. Mike Jones says:

      "Actually, most software and standards that are labelled "open" have royalty-free licenses. Look at the legal verbiage that standards bodies use and you will see that there is implicit ownership and a license agreement for all of their deliverables."

      Yes, you can indeed have open standards that include patented technology under royalty-free licenses. Such licenses are not just royalty-free, however, they include additional guarantees to users (e.g., they may be transferable or they may be available through an independent standards body). Microsoft’s licenses do not have such guarantees.

      "Partially because of the loud voice of the open source community, Microsoft has now chosen to open up a portion of its intellectual property."

      Come on, be serious. Microsoft didn’t document these formats because of some new-found fondness for open source or because someone asked nicely. Microsoft reacted to strong pressure from customers and did the absolutely minimal necessary thing. They are trying to keep the formats proprietary while giving the appearance of being open.

      "Instead of complaining that it’s not enough, we should encourage the folks in Redmond."

      Do you seriously want us to believe that Gates and Ballmer, tough take-no-prisoner executives with a history of illegal business practices, respond to friendly conversation or reasoning from open source advocates? Microsoft management has stated clearly that they intend to kill open source through patents, and that is exactly what they are attempting to do with this.

      The only thing that "encourages" Microsoft to do anything is the threat of losing market share. Microsoft will have to be dragged kicking and screaming towards open systems, but it will be dragged. Patent shenanigans and PR bullshit like "open XML formats" will not make a difference; in the long run, they’ll just backfire, because they’ll invigorate open source developers and create more competition for Microsoft.

    44. Lobsterman says:

      Yes, i know that Microsoft likes doing things its way, but not long ago the open OASIS standard was announced, why can’t you guys get together, each with a few compromises, and agree on one standard format for documents, for ounce think about the good of the customer, and not just have to be different, you can start some open standard consortium like W3C and finally standardize things

    45. What about IRM documents and non-windows platforms? We’ll leave off Linux for now. What about MacOffice 12 and IRM’d documents? That needs to interoperate too. If it doesn’t, this is a half done initiative. The IRM question needs to have the correct answer, and that answer is NOT "Use Windows", *especially* not to someone running MacOffice.

    46. Hi Brian

      Grats on the new file formats for Office. I just have small question – is the new formats for Office the same formats which was codenamed "Metro" in the WinHec Longhorn presentation?

    47. Ignace Lamine says:

      One thing I didn’t like about xml in Office 2003, was that when I have a clean, structured xml file, created from scratch, and I do a simple edit in the Word editor, like changing text, the whole xml file is changed with Word’s typical xml stuff and structure is gone. I was thinking of some sort of text-change-only mode would be usefull, where all formatting is reserved and after resaving only data (text) inside <w:t> tags would be allowed to change.

    48. Voidless says:

      Good news, everyone.

      So all we’ll need is just the right XLST stylesheet to transform MS XML Office format to OpenDocument. I think this really moves Microsoft towards compatibility with the rest of the world 🙂

      But seriously – I think that if MS XML Office format and OpenDocument format will be losslessly transformable just via XLST transformation, the world is about to be happy.

      I can imagine big Unix/Linux backend servers that store thousands and thousands of documents in OpenDocument format, utilizing fulltext search, categorizing, excerpt creation over them while still being able to offer these documents to end-users in a format MS Office will be able to open with no harm.

    49. PatriotB says:

      Chris– you’re right about the properties now being unavailable to Windows. They will have to provide a property handler for the shell and indexing services to use in order to access the properties, and since this will involve opening the zip, etc, it will probably be much slower than currently. I’m surprised to see a move away from structured storage/compound documents, since MS has invested a lot into that technology. Avalon will be using compound documents for file storage, so it’s not like the technology is bad or out of date.

    50. BrianJones says:

      Chris and Patriot – You are right that there will be work done in the shell to support property promotion with these new formats. ZIP allows for random access, and since compression is unique to each part (and optional) it shouldn’t be too hard to get to the properties. We’ll also build iFilters for searching the actual contents of the documents.

      Office isn’t the only application though moving towards ZIP though. Patriot, I believe your reference to future support for structured storage may be a bit outdated, but maybe you are refering to something I’m not aware of (please let me know). The new Metro format from the Windows team is going to use the same ZIP packaging that Office is using: http://www.microsoft.com/whdc/device/print/metro.mspx

    51. Hi Brian,

      According to the white paper, the VBA Project will always be stored as a binary file. While that is great for production files, where we wouldn’t want people to see the code, it’s horrible for development. I would love to be able to store one of these files in ClearCase / SourceSafe etc and see the VBProject in clear text – which would then allow me to perform diffs, merges etc on the code.

      Also, will you be adding the ability for us to store VSTO assemblies in the document, so we could deploy them within the document file rather than having to host them on a web site?

      Regards

      Stephen Bullen

    52. BrianJones says:

      Stephen,

      Actually, the default formats don’t store a VB project. You need to use the macro-enable (that’s just what we are currently calling it internally) version of the format if you want to store the VB project. The VB project will be stored as a seperate binary part in that case.

      The format is fairly extensible, but we aren’t building support for embedding the VSTO assemblies and being smart about handling them when they are opened. While this isn’t really my area, it is something I have an opinion on. I think the direction for distribution of code and solutions is really moving away from code within the document. The documents really just have the contents, meta-data, maybe even customer-XML blobs that help identify what their content type is and what solution they are a part of. The code then lives seperately which makes it much easier to manage and makes the documents much more portable.

      -Brian

    53. SMC says:

      <p>I also have to wonder if any confusion that may be caused by everyone referring to this as the "MS <b>Office Open</b> XML Format" and its similarity to the "<b><a href="http://www.openoffice.org">OpenOffice</a></b&gt; XML Format" is an intentional marketing ploy…</p>

    54. SMC says:

      (try this again without HTML…)

      I also have to wonder if any confusion that may be caused by everyone referring to this as the "MS *Office Open* XML Format" and its similarity to the "*OpenOffice* XML Format"[1] is an intentional marketing ploy…

      [1] http://www.openoffice.org

    55. PatriotB says:

      Brian — Here’s a link to a document about Avalon using compound files. Of course this may all be pre-Metro since Metro was only announced a month or so ago…

      http://winfx.msdn.microsoft.com/library/default.asp?url=/library/en-us/wcp_conceptual/winfx/docservices/overviews/edocs_ovw_compoundfiles.asp

    56. BrianJones says:

      Hey Patriot, you’re right, that was pre-metro. They are now using ZIP in the same way we do in Office.

    57. DJ says:

      Mike Jones, what exactly are you wanting for it to be called open? Letting anyone who wants to change it? That would be a nightmare, for obvious reasons. Seems to me ‘open’ means ‘not a binary format (or other format for that matter) that has no mechanisms for interpreting it except through the app itself’ Does OpenOffice call their formats ‘open’? If so, what is the difference between their open format and Microsoft’s?

    58. Felix says:

      Do you plan to use something like MathML for equations?

    59. Kevin says:

      1. Great news – good job Brian!

      2. What tools/features I’ve being waiting for in years:

      As a web developer in his 6’th year of experience. I have being fuzzeling with Content Management since I started. My dream of my Content Management systems is that the authors could use MS Word and Excel to publish their content. And with an Office version that could save As HTML we were on the right track I thought.

      But unfortunately the file size was extremely and with its half xml-alike format (WordML) and images that where stored in a subfolder (1 original and 1 scaled) I gave up making a Word plugin.

      With Office 2003 I have spent many hours of testing how I could use the native XML format it can save a document as.

      But somehow the native format requires a lot of schema developing – and I even tried but with no promising results.

      What I’ve been waiting for and maybe still will be from the MS Office 12 point of view is a Server side Office Document Content Framework and a client side Office Content Builder, these of cause all fictive names. The Office Document Content Framework on the server side should be able parse all parts (tags or what ever) made by the author in the Office Content Builder maybe even running directly in Internet Explorer.

      The Office Content Builder has access to all documents on the client so you easily can open documents in the Content Builder and save on a server that can handle the document with the Content Framework, then parse each element – apply the styles that are used for presentation directly from the server environment, could be a corporate website – marketing compliances like a press site or documentation for a product.

      The core problem I have with the Office suite is that what I face everyday is that not 2 documents made within the same organisation looks equal often even the same person (including myself). And I develop e-business, branding solutions and accsociative business web solutions, extranet, intranet etc. which not only have the goal to be functional but at the same time promotional and with a branding effect on the persons that use our customers solutions.

      That makes Office to my worst enemy – fairly because all of our customers use Word and Excel and has to adopt new behaviors when making content for their web solutions.

      My solution has to this day been to develop a java application (as the Office Document Framework) and a java Applet (as the Office Content Builder) – but I’m not a skilled Java programmer and the time to develop such applications are way off what I prioritized for at work.

      I have to think how our customers get the most value out of our solution for the lowest TCO possible. That feeds our customers to come back for more and get value out of their investments.

      Now with this announcement of Office 12 supporting XML as its default format – and how the document structure is laid out, I again feel some sort of – YES, let me get hands on and I’ll make my KILLER APP.

      – Thanks for listening

      Kevin

    60. Pascal says:

      What do you know, Microsoft goes and steals ideas from OpenOffice. Another one of Microsoft’s "Embrace and Extend" strategies, eh? Tsk tsk. I’ll take OOo, thanks.

    61. Ofer Goren says:

      If I read the undercurrent correctly, I should now be able to use a repository of values in various (Word) documents. Altering a value in the repository (only there…?), should also affect all other instances where that value is embedded (linked).

      My question is:

      Is there an intuitive MS Office method for performing the XML-type embedding and linking (Drag & Drop… Into what…?)

      -Ofer

    62. BrianJones says:

      Pascal – Thanks for your post. You are right that we are not using new proprietary technologies for these formats. We decided to use XML and ZIP since they are already so widely in use today by many different applications, OpenOffice included.

      Ofer – The formats themselves do not introduce new functionality for linking documents to data sources. Since they are open though, you easily get access to all of the applications functionality. Word already has support today for custom XML, which allows people to mark up the documents with their XML, and build additional solutions on top of that. Unfortunately, I currently can’t talk about what else is coming in Office12 around this, but I’m really excited about the chance to dig into this more when that time comes. I’m sure you’ll notice that XML is a big deal to us, and we continue to look at ways to innovate here.

    63. whatever says:

      If XML format is the default in the next version of Office, so, how can you protect, lock the word document?

    64. BrianJones says:

      Whatever – The protection of the documents will be handled in the same way it’s handled for the current binary documents. You can either use encryption; IRM; or if you just want to validate it you can use signatures.

    65. Glen Turner says:

      Are the MIME types for the new XML formats the same as for the current Office formats?

    66. Kirchrainer says:

      Thanks, Brian, for your first hand informations about the forthcoming Office file formats. Could you please address the points brought up earlier by other readers concerning the OASIS OpenDocument standard. It’s the only type of questions you didn’t answer so far.

    67. BrianJones says:

      Glen – The MIME types will be different for the new formats.

      Kirchrainer – I actually have a seperate post addressing OpenDocument format. http://blogs.msdn.com/brian_jones/archive/2005/06/13/428655.aspx

      There are actually a ton a great replies already. I haven’t had a chance to reply with my own comments yet, but I hope to later on today.

    68. Ken Brubaker says:

      You can model your application’s file format after Microsoft’s new Office file format.

    69. mitch says:

      Brian,

      Thanks for taking feedback. Don’t take this personally … but Microsoft XML (from Word) only makes my life miserable. I am going to start pushing OpenOffice at my company. I’m sick of Microsoft’s convoluted XML. When I save as XML, I can’t even open it in an XML editor (I use XML Spy).

      Please provide a "simple XML" for export that just uses style sheet names as elements, and I’ll finally be able to use Microsoft’s XML without creating a tool/process to clean it up.

      Given how the XML format is supposed to be used with XSL and as a clean way to share information between applications, why does Microsoft make that possible when exporting XML?

    70. BrianJones says:

      Mitch, thanks for your post. I hope you can give me some more information because I’d really like to know what changes can be made to make things easier for you.Do you think you could give me more information on what you’re having trouble with?

      Does XML Spy fail to load the XML, or does it hang? Some XML editors have a really hard time opening XML files that don’t have line breaks. We don’t pretty print our files because it helps the save performance, but as a result there is the problem it sounds like you’re having. Let me know if that’s what’s wrong and I’ll provide some suggestions.

      Could you give me a bit more detail on what you mean by exporting stylesheet names? Do you just mean you want the XML format to not have as much information in it? Or are you actually looking for a different structure? What kind of cleanup are you trying to do? All the XML in the file is used to preserve the document. If we didn’t use all those tags, then you would start to lose some functionality.

      I don’t understand your last question. Do you mean why don’t we output both an XML file and an associated XSLT when we save? Or do you mean why do we allow you to save through an XSLT?

      -Brian

    71. mitch says:

      Brian,

      Thanks for the quick reponse. I found out why the XML file (exported from Word) wouldn’t open in XML Spy: Word had locked the file. Once I closed the document in Word. Word still had it opened as myfile.xml after I had saved it as XML, even though I had opened it originally as myfile.doc. Whether that is the best behavior is another discussion.

      About the XML format. When I have a Word file with styles like Heading1 and Body, it would be awesome if Word would save something similar to this:

      <document>

      <heading1>My Heading</heading1>

      <body>My paragraph.</body>

      </document>

      Only the styles of the text should be saved, as XML elements, plus a few (very few) extra elements that would be necessary, like <document> because you must have a root element, etc.

      To answer you last question, I wasn’t very clear. What I was trying to say is this: why does Microsoft save documents in an XML format that only works well for sharing documents among Microsoft apps? And, no, dear readers, I am not naive … Seriously, I would love Microsoft Word (for my clients) if I could get clean, simple XML from them. Microsoft should provide their customers (which currently has included me, but I’m seriously looking at OpenOffice now) with a way to export XML that could really be useful as data for other applications — such as transforming the XML into HTML, or importing it into a database, etc. Unfortunately, the XML exported from Word is convoluted with presentation XML specific to Microsoft Word. Same situation with the HTML exported from Word — a <p> should be a <p> tag, not <p style="a million style rules">.

      Provide both: Microsoft XML and simple XML. It would be easy to do. Leave out ALL the XML about presentation. And simplify the XML remaining XML elements (only used for content) to be named after the Word styles.

      A style called Heading1 in Word would be exported as an element called <Heading1> in the XML.

    72. Dylan Pierce says:

      It’s sad that a corporation that was once an industry leader has now been reduced to grabbing other people’s ideas as quickly as it can in an effort to not get left behind.

      I took my company off Microsoft as soon as we finished verifying that OpenOffice and the OpenDocument format was adequate for all our business needs. Microsoft will have to do a lot more than show up late at the party saying, "Me too! I got XML too! I got open licenses too!"

      I don’t know where you hand-picked the other respondents to this blog, but I know among the tech-savvy people I deal with, the reaction is not one of, "Yay for Microsoft!" Rather, it’s sort of a sad head-shaking at what basically amounts to gramma trying to dress like she’s still sixteen. We’re not convinced.

    73. BrianJones says:

      Thanks for the comments Dylan. I’m sorry you don’t view this as great news. You should know that this move is far from being reactionary though. Just look at the history. The use of XML as a file format is definitely not a new idea. As I’ve pointed out in other blog posts, we first started using XML in Office back in 1997 when we started to work on Office 2000.

      The simple idea of using XML as a file format shouldn’t been seen as an innovation itself. There are a ton of other software applications out there that also use XML in their formats. The reason we did this work to make it the default format is that XML is so widely in use today, and there are tons of tools available for working with it. I guess we could have gone and invented some other technology, but that would have defeated the goals of using common industry standard technologies to represent our formats (ZIP and XML).

      I’ve had a couple recent posts showing how to leverage SpreadsheetML for Excel. Work for that specific format started back in 1999, and it’s now shipped with 2 versions (XP and 2003). Try it out if you get a chance.

      -Brian

    74. Some of you who have worked with Office 2003 xml files may have noticed that while we use the &quot;.xml&quot;…

    75. If you read Part 1 of the Word XML Introduction, you saw the basics behind a Word document, as well as…

    76. Sue Little says:

      Can you tell me how a Business Systems Developer (no web, no XML but lots of VBA) can determine how storing a document in XML might benefit a small business currently sharing much of its data across Word, Excel and Access through VBA?

      Interoperability is of paramount importance to us but I really can’t see how XML would help. Most examples I have seen require .Net libraries to read XML data via VBA into different applications. Will the necessary libraries be built into Office 12! I have little time for training as it is so I am trying to determine whether this a technology I should embrace (or steadfastly ignore)? The learning curve appears a little steep from where I am standing.

      I could save my documents in an XML zipped format now (if I wanted to) so please tell me how I can USE this new technology. For example I have opened a XML version of an Access query in Word 2003 but it seemed much less useful than reading the data in via VBA (I couldn’t do anything with it) so please what is the benefit? What am I missing?

      For example, currently we have VBA code wich allows our secretaries to enter a scheme reference and see a list of addresses relevant to a client. They then select the appropriate address for the letter (home, bank accountant etc.) Data is read from an Access database and inserted in the correct format so only the body of the text has to be inserted. Could you tell me how that can be achieved through XML? Or perhaps you could tell us how a mail merge might work?

      Of all the tecnologies Microsoft has introduced this has baffld me the most.I don’t currently have a problem with your software. Data storage is getting cheaper so zipping files seems a retrograde step and splitting files into multiple files just seems to add to the everyday overheads. More files to potentially become corrupted? More files for users to "lose"? But that is OK because they are smaller (and therefore less important?).

      Do you have any users on your committees?

    77. Sue Little says:

      From Chris Prattley’s blog – "There’s a thing on the client called the "schema-library" that associates XML namespaces of your choice with XSL files, solutions, etc. This means once you’re set up in the schema-library, you can dump blobs of XML to Word (via e-mail attachments, or code), and Word will check the XML you provide – find the associated files to deal with it locally, and transform that XML using a presentation that can also retain the XML markup you supplied. Note this important difference – this is not converting one schema into another like a file converter (although it can be used that way) – it is generating presentation to wrap around the actual customer data, which is retained in the resulting file."

      This is more like it but how do I locate the schema library? I think Microsoft is going to have to start lessons soon. I want to learn how to utilise this technology but I can’t afford to wait until the product is launched.

      Personally (after taking a look at Infopath) I believe our processes are too complex for XML but I can’t tell from the information I have. Also, I can’t ask for time and money to train for something that might be of little or no benefit to the company. There must be hundreds out there like me (sole IT person, small, IT intensive company) who need practical help in understanding what can and can’t be done in XML.

    78. BrianJones says:

      Sue, a good starting point would be to play around with the labs that I mentioned in this post: http://blogs.msdn.com/brian_jones/archive/2005/07/08/436880.aspx

      -Brian

    79. A while back, I read that Microsoft is switching to XML-based document formats&amp;nbsp;in the next release…

    80. A while back, I read that Microsoft is switching to XML-based document formats&amp;nbsp;in the next release…

    81. A while back, I read that Microsoft is switching to XML-based document formats&amp;nbsp;in the next release…

    82. David says:

      Mark Twain said: "A successful book is not made of what is in it, but what is left out of it." Excellent reference, though :-).

    83. Microsoft qui a annonc il y a peu le nouveau nom de son futur OS continue de dvoiler l’avancement de ses projets. Et le petit dernier s’appelle Office 12. Cette nouvelle version du pack office semble tre un progrs crucial pour les applications…

    84. Massachusetts’s Information Technology Division has released Microsoft’s formal 15-page reply to the state’s controversial draft policy on information standards. That policy would mandate that the Open Document Format be used for all &quot;office&quot; documents (ie, word processing, spreadsheet, and presentation documents). Because OpenDocument is incompatible with Microsoft’s Office applications, the…

    85. Nothing to say, interisting point of view.

      ————————–

      John from <a href=’http://www.go-monaco.com‘>monaco hotels</a> (http://www.go-monaco.com) ***

    86. No comments. Well done.<br>

      —————————-<br>

      [<a href=’http://www.online-gambling.nu‘>online gambling</a>]

    87. Tarun says:

      Dear Brian,

      I went through your blog and was highly impressed by the new format. I think you can really help me since you have been working on Office Format for microsoft for the last 5 years.

      Brian, I work with a Document Management Company and we are looking to incorporate and handle Microsoft’s Office Format’s in our Viewer.

      I am trying to find out as to where can I get the technical Specifications as to all the Office Formats of Microsoft.

      Currently our viewer lacks support for Office Formats. If you could guide me on this, I would really appreciate your gesture.

      Youcan get me on tarunklal@gmail.com

      Kindly let me know the same. We need all possible specifications in order that we handle and display the Office formats successfully.

      Thanks

      Tarun

    88. sardanapalo says:

      …it was time to have XML support.

      let’s hope it works!

    89. Direkter Download: SPPD-2005-06-08&amp;nbsp;9,1 MB

      Intro

      Editorial

      SharePointTag und Webcasts Downloads…

    90. Brian Catlin says:

      Just an FYI regarding your attribution of the quote regarding your not having the time to make it shorter (last paragraph).  While it does sound like something Mark Twain would write, it comes from the great composer Franz Liszt in a letter he wrote (to J. W. von Wasielewski) on January 9th, 1857.

      http://www.globusz.com/ebooks/Liszt/00000183.htm

    91. Smith says:

      This is definately big news. Personally I am an Office 2004 user, but I will be very glad when I can have document portability between word processors. but if u need any help from us just log in to :http://www.ideas4mysmallbusiness.com

    92. alan says:

      i found alzip for many days, at last i found ALZip – a zip and file compression utility.

      http://www.yaodownload.com/utilites/file-compression/alzip/

    93. 워드 2007은 기본적으로 .docx 확장자로 저장을 합니다. .docx 아이콘이 기존 아이콘과 비슷한 모양을 사용하며, .doc 파일은 워드 97~2003 호환 포맷으로 2003이라고…

    94. buy xanax says:

      i like your website very much but please do get us more information about it

    95. Microsoft has announced that the next version of Office (unofficially "Office 12") will deliver support

    96. If you read Part 1 of the Word XML Introduction, you saw the basics behind a Word document, as well as

    97. Microsoft has been and continues to be fully committed to opening its document formats for Word, Excel

    98. Direkter Download: SPPD-2005-06-08 9,1 MB Intro Editorial SharePointTag und Webcasts Downloads auf www