First presentation on the new XML formats

Yesterday was the first presentation I gave on the new file formats since we made our announcement last week. It was a lot of fun. I focused a lot on just explaining what’s really going on with the new formats and what kind of an impact it will have. I also showed a few demos using an early version of the patches that will be available for the older versions of Office. I was using Office 2003 with the patch applied which allowed me to open and save files in the new format, so I could show the following:

  1. Updating a diagram in a spec: I showed an example of taking a technical spec with an old diagram, and outside of Word I swapped it out with a more up to date one. The main purpose of this wasn’t to show that an end user would do that to their files, but instead to show that people could easily build solutions that push relevant pieces of content into files.
  2. Removing comments: Most people that manage collections of documents or deal with publishing documents have seen the problem that can occur with extra information in their files. I took an example of a whitepaper with a bunch of comments in it. Often, an end user will just turn the comment view off, and not realize that when they save the file and post it up on the web, everyone else can still see those comments. If it turns out that an end user doesn’t know to delete the comments, it’s still easy enough to just build an automated step in the publishing process that strips those comments out. In my demo I just unzipped the file, deleted the part called “comments.xml”, and showed that when you then open the file back up all the comments are gone.
  3. Document corruption: I took a rich Word document and opened it up in a hex editor. I scrolled down to a random spot and just started zeroing out a bunch of bits. I then tried to open the file with a ZIP tool and showed that it was corrupted and couldn’t be opened. I opened in the Word though, and it opened just fine. Even all the formatting information was preserved, so most likely the only thing corrupted was some of the meta-data or some other piece of information that didn’t affect the display (obviously a much improved experience over the current binary formats).
  4. Footer & Header update: I took a nice looking whitepaper with a rich header and footer that was synced to the document title and author name. I then opened another whitepaper that had a really lame header and footer. I showed how in an automated process, it was easy to quickly take the header and footer used in one file, and apply it to the other file. This was an example of how easy it will be to update a collection of documents to match a specific corporate standard.
  5. Bulk style change: I used the System.IO.Packaging in the WinFX SDK to go over a collection of 100 whitepapers that all had a basic style associated with them, and update the styles to match a more colorful collection of styles. It took just a couple seconds to update all 100 documents.

The demos actually turned our really great, especially considering how we still have a long way to go in the development cycle. There was a great mix of people at the presentation which shows how big of a deal this is, and how it has a huge impact on a broad range of areas. There were people concerned about deployment and user impact, and others curious about what type of development tools would be provided.

I’m going to make sure I focus time discussing all the different aspects of the format over the next year. There are a ton of really cool exciting implications for 3rd party developers, but there are also a lot of great things we are doing to try to make deployment as easy and painless as possible given the fact that this is such a huge shift.

I’ll start to assign some more specific categories to my posts, and I’ll try to answer everyone’s questions as we go forward. If you have suggestions for categories you’d like to see discussed, let me know! Thanks to those of you who’ve sent some great questions to me in e-mail too. I’ll try to reply as soon as I can, and most likely I’ll start using the blog to post my reply so everyone else can follow along.


Comments (6)

  1. tsadok says:

    Another use of the new format will (hopefully) be the ability to easily list the differences between two versions of a Word document.

    The current "Track Changes" feature is quite useful, but falls short in several situations.

  2. Jason Porter says:


    Can you comment on the Visio product support for this.



  3. BrianJones says:

    Hey Jason, Visio already has an XML format that you can use today. They are not planning on moving that format into the ZIP container though for the next version. The only applications using the ZIP container will be Word, PowerPoint, and Excel.


  4. Ronald L says:


    you said the schemas will be published shortly. Are they already available?


  5. BrianJones says:

    Hey Ronald, we just published a sneak preview of the schemas last week. Here’s more info:

  6. Anton says:

    I want mp3 player. What will advise?