Office ’12’ File Format Presentation available online (PDC)


If you’re interested in seeing the presentation I gave at PDC back in September, it’s now available online: http://microsoft.sitestream.com/PDC05/OFF/OFF304.htm#nopreload=1&autostart=1


It’s about an hour and 15 mins. It’s pretty much a developer level introduction to the Office ’12’ file formats. I show a number of demos and cover the following agenda:



  1. Demo: Example Office XML File

  2. The role we see XML playing in Office documents

  3. Demo: How to create an Office ’12’ document from scratch

  4. What are the components that make up a file?

  5. Demo: Modifying an Excel spreadsheet

  6. XML Data Store (I’m really excited about this; I’m have more info on this in future posts)

  7. Developing against the formats

  8. Demo: Example server side solutions that act on the formats

If you get a chance, please take a look. It really should help you understand more about the formats and how you can work with them.


-Brian

Comments (29)

  1. Kris says:

    Cannot view the presentation. Service Unavailable!

  2. Mike Dunn says:

    You probably just need to change # to ? in the URL.

  3. Maverick says:

    The correct URL is as shown in the post, but the linked URL is incorrect. Try http://microsoft.sitestream.com/PDC05/OFF/OFF304.htm#nopreload=1&autostart=1, note the placement of the # and &.

  4. Deck says:

    Time after time I think, hey, maybe Microsoft will finally get things right, but no. Your company, no offense, seems to have a large number of complete dumbasses.

    Example 1: Aiming to monopolize the music business by forcing Media Player onto other company’s products, then blaming it on a low-level flunky when everyone knows it has to be a managerial decision.

    Example 2: Saying "we’ll pull out of South Korea if you don’t stop investigating us for anticompetition" just when you want to convince Massachussetts that you won’t do the same there.

    No wonder the people who actually get things done (Lucovsky, the wiki dude, start.com guy, Gagne…) can’t wait to leave.

    I think every MS employee needs a mandatory weekly cluestick encounter.

  5. Hazz says:

    I’ve been reading your blog for some time now, so it’s good to finally meet you (sort of). Great presentation (I read somewhere that the majority of the population would rather have teeth pulled than get up in front of an audience and present – however you present well and seem very comfortable), it really helped me understand the new formats. The questions at the end were very good for delineating the boundaries of the capabilities of the new formats. BTW the links all worked fine for me!

    I only program at the very basic hobbyist level, I’m a Business Analyst by trade. A couple of years ago my company (approx 130 information workers) switched from open source office, database and networking software to MSO, MSsql Server etc. Why – buying software for all of those users was not cheap, the conversion has caused some pain – so why – the productivity of the users (and the Business Analysts) has way improved! The money is more than worth it, otherwise companies would not use MS products duh! I am sure that other people and other companies have had other experiences but I can only speak to what I know. I’m a little tired of hearing the whole "the entire world should move to open source" spiel, my experience (with only my hobbyist level of expertise) of open source has been – unreliable, buggy, less features for user productivity, and only uber-expert IT programmers can resolve problems, the average user gets left in the dark waiting for the uber-programmers to fix things. And BTW there is a skills shortage in IT, we could not actually lay hands on any uber-programmers so we had to settle for some merely "good" programmers on staff, who over a period of years failed to see the bigger picture, failed to mitigate business risk, failed to self-manage and led our open source infrastructure into a pretty bad cul-de-sac.

    Anyway I’m not here to defend Microsoft, or flame the detractors (who have made some excellent points along the way), but this is not the forum for the open-source bandwagon. The ultra open source license, is effectively enforcing a limited license anyway, cause no-one can use the ‘open-source’ code unless they 100% subscribe to the open source paradigm (like anything you build on it also has to be 100% open source, so it’s soooo open it’s actually closed to all businesses, closed to anyone trying to make a living (earn money) out of programming, and closed to all but uber-programmers). And BTW I thought that the whole communism thing has shown that monetary self interest (capitalism) gives better economic, productivity and efficiency than ‘the common good’ rhetoric (as good as it sounds in principal) so what is so evil about a company wanting to make money by providing software people can choose to buy or not (by the way I’m not naive about the strong arm tactics companies can and do use – I work for one, but ultimately the consumer has the power of choice and can vote with their wallet). But I’m not here to talk politics or marketing philosophy either.

    OK, so, I fail to see how Microsoft could be more open about its new file formats, or how its royalty free license could be any better (at least for businesses using MS products who want to add solutions on top – anyway). The new XML file formats give quantum leap benefits over the previous binary formats, allow for “good” programmers and even hobbyists to easily create some amazing business solutions on top of the XML formats and are huge for reducing business risk around file corruption, data persistence, etc.

    I don’t think anyone else in my company is switched on to the new 12 formats yet, but that is part of the role of a Business Analyst to be aware of new technologies and how they will affect the business and how we might be able to leverage them and awareness of any ways to reduce business risk is vital too. Rah, rah, rah… BA’s are beautiful etc. …

    Anyway – back on topic (and down off the soap box :-), the cfChunks – are they the basis for any Word12 (or Excel12) master document/subdocument native functionality. It’s easy to see how developed solutions could use them for this (thanks for you excellent blog of July 20), but will office 12 do so natively? Further what if a cfChunk in a word 12 file was actually a Chunk of SpreadsheetML – would this effectively be OLE of some sort?

    You mentioned briefly (code snippet examples of using the XML) complex numbering in your PDC presentation. Do the new XML file formats resolve the issues previous versions of Word have had with multi level lists and linked styles (it seems there are still some issues in 2003 – but they may be more around user expected behavior when styles are linked to outline numbering, hot fixes may have already resolved these too – I don’t know, I very rarely have problems with multi level numbering as I follow 1 simple rule – never link default styles to numbering – create new custom styles not based on the default heading styles, only used for numbering, then I can use Heading1 and my new Heading1-Numbered etc. with relative impunity. I have also almost never had problems with the bullet and numbering toolbar buttons. But I digress, again!) Maybe this is a question for Joe Friend?

    Anyway – Thanks for the blog, the wealth of information you are providing and for sharing your enthusiasm with us.

  6. Craig Ringer says:

    Hazz: With regards to your comments, I really can’t disagree. In particular, it’s very easy to make bad decisions with technology choices (selecting tools that are unreliable, unfinished, poorly maintained, or simply don’t fit your needs) and this usually goes badly. I have plently of frustrating firsthand experience with poor quality user-hostile productivity-inhibiting software and hard to maintain systems, both open source and very much closed source.

    I also agree that this isn’t an appropriate venue for pushing open source. I’m personally tired of people who feel that they must push it all the time, everywhere, even when it’s obviously not a good fit with someone’s needs or where it’s just not important. Surely using the tools that best fit _all_ your needs is the right way to go? Myopically focusing on "open source" as your only evaluation criterion is as stupid as focusing on absolute minimum price or on how pretty the users think the user interface is.

    That said, I do think you’ve made one significant error that needs clarification. The license you refer to as the "open source" license is presumably the GPL – which is only one of a large number of licenses. It’s one of the most restrictive, and there are a *lot* of open source licenses that do not impose the same requirements. It’s inaccurate to say that open source licenses require you to make "anything you build on it … 100% open source," since in fact that’s only true of the smaller "Free software" (software libré) subset. In fact, Microsoft just adopted two of the less restrictive ones – a slightly altered BSD license and a an MPL-derived license – for the shared source licensing programme. As outspoken opponents of "Free software" licensing, this helps show just how wide the range in open source licensing models really is.

    I must also agree that the new XML formats are great progress. It’s going to open up some very interesting possibilities in the business network I’m responsible for. I’ve noted before that I’m crossing every digit I have in the hopes that Office "13" will extend the same facilities to MS Publisher. There are a lot of interesting possibilities here.



    Craig Ringer

  7. Kaleb says:

    After several attempts to view this presentation, I finally gave up. It always crashes IE after just a few minutes.

  8. William says:

    I’ve been reading your blog for quite a while now, and I’m excited about the new XML format. Can you please comment on the performance of the new Office 12 XML format? With Office 2003 for example, a large workbook can be opened in few seconds in native XLS format, but it takes 10 to 15 minutes if it is in the Office 2K3’s XML format. I hope Microsoft will improve this since it will be the native format from there on. In addition, can we also expect that the *patch* for Office XP and 2K3 also *patch* the performance issue instead of just enabling read/write capability?

  9. BrianJones says:

    Thanks for all the comments. I’m sorry if some folks weren’t able to watch it. One thing that might work better is to go follow this link: http://microsoft.sitestream.com/PDC05/OFF/OFF304.htm

    (You’ll need to wait for all the material to download before it will play).

    William, it’s really too early to provide actual numbers on what the performance will be as we haven’t yet started working on the optimizations around file I/O. The formats will definitely be much faster than the SpreadsheetML format from Excel 2003 though, I can definitely say that.

    The performance of opening these new formats in older versions that use the free updates will be slower than opening the same files in Office ’12’. The reason for that is that we actually need to open the file and translate it so that the older version will understand it. That means it will be slower (but no numbers yet on that).

    An area where you’ll see improvements is in opening a file over the network. Since the files are so much smaller, the time to transmit the file will be much less.

    Perf is a really big deal to us, and we’re going to work hard over the upcoming months to make sure the new formats won’t cause pain to end users.

    -Brian

  10. Peter Cox says:

    If the Office 12 XML formats have features that Microsoft believes are technically superior to those in the OASIS standard OpenDocument 1.0, why didn’t it participate in the creation of that standard as it was invited to on many occasions? I’ve got a bunch of users still on Office 2000, but why I should pay full price to upgrade them to Office 11 or 12 just so they can be locked into a file format controlled solely by Microsoft when I can now choose software that uses formats that have gone through the standards process and are available under an open license? How could I even begin to justify buying Office under those circumstances? I’m not trying to be anti-MS as I’m a long-time Office user, but I have yet to hear a reasonable explanation from MS that would answer any of these questions.

  11. BrianJones says:

    Hi Peter, that is a great question. There has been a lot of news recently around that format, so I understand why you’d ask the question. The move to open XML formats is a really great move for any application. The OpenDocument format is a modification of Sun’s legacy StarOffice file format. Sun brought forward the StarOffice file format which was then discussed and documented at OASIS and became OpenDocument. We already had the WordprocessingML format which fully supported all legacy Word documents and that’s what we’ve based our new formats on. Our number one priority was compatibility our customers’ legacy files. Our formats are 100% compatible with the existing binary files and are available under a royalty free license which means you have complete control over your files.

    In addition, as I’ve discussed before you don’t need to pay any money to upgrade if you want to use the new formats: http://blogs.msdn.com/brian_jones/archive/2005/10/11/479808.aspx

    There are free updates to the past 3 versions of Office which allow them to read and write the new XML formats.

    -Brian

  12. Yuki says:

    "I’m a little tired of hearing the whole "the entire world should move to open source" spiel, my experience (with only my hobbyist level of expertise) of open source has been – unreliable, buggy, less features for user productivity, and only uber-expert IT programmers can resolve problems, the average user gets left in the dark waiting for the uber-programmers to fix things."

    That may be true for a number of "geek distros" but any idiot can use Ubuntu or SuSE. You also seemed to refer to a corporate/enterprise environment, where you need competent technicians to manage things regardless the software you use – Windows or Linux or whatever.

    "And BTW there is a skills shortage in IT, we could not actually lay hands on any uber-programmers so we had to settle for some merely "good" programmers on staff, who over a period of years failed to see the bigger picture, failed to mitigate business risk, failed to self-manage and led our open source infrastructure into a pretty bad cul-de-sac."

    Wait, let me get this straight. You employed a couple of incompetent guys and things started to break? Wow that’s unbelievable!

    "Anyway I’m not here to defend Microsoft, or flame the detractors (who have made some excellent points along the way), but this is not the forum for the open-source bandwagon."

    Your ideas are different than mine so I don’t want to hear them! Shoo!

    "The ultra open source license, is effectively enforcing a limited license anyway, cause no-one can use the ‘open-source’ code unless they 100% subscribe to the open source paradigm (like anything you build on it also has to be 100% open source, so it’s soooo open it’s actually closed to all businesses, closed to anyone trying to make a living (earn money) out of programming, and closed to all but uber-programmers)."

    First of all, you don’t have a clue what you’re talking about. Second, if you refer to the GPL, that’s one license designed to guarantee access to the code to everybody forever. Its "viral" nature is a great tool to make sure the software you wrote will be always available to everyone, forever and ever, amen. If you had a clue you would know that 99% of the software that is written by programmers is written for enterprises that are NOT in the software market. I’m talking about enterprises that produce cars, pasta, glass panes, whatever. You know, all those things around you. Unless they need to do only basic accounting tasks, they’re going to have to hire programmers to make special purpose apps that range from stupid 20-lines scripts to full-blown complex applications, depending on their needs. The GPL is perfect for them because the programmers they hire can get code from the enormous pool that the Internet is without giving back anything (the software stays inside that company, it’s not distributed to the public so the GPL doesn’t force them to give back modification nor to distribute the source code to anyone). They can make whatever modifications they want and still maintain their secrets. That’s the license you say is useless for business.

    The OSS also includes several other licenses like the LGPL that are perfectly fit to be used in the software market.

    "And BTW I thought that the whole communism thing has shown that monetary self interest (capitalism) gives better economic, productivity and efficiency than ‘the common good’ rhetoric (as good as it sounds in principal) so what is so evil about a company wanting to make money by providing software people can choose to buy or not (by the way I’m not naive about the strong arm tactics companies can and do use – I work for one, but ultimately the consumer has the power of choice and can vote with their wallet). But I’m not here to talk politics or marketing philosophy either."

    Communism is a regime that "encurages" sharing. If you don’t share, you get a bullet in your head. OS doesn’t force anybody to do anything. On the other hand, an obvious mark of communism is a monopoly controlled by a single entity. Does that ring a bell?

    OS encourages competition and the proof is right under your eyes. Tons of apps have been written and there are so many competing products it’s impossible to keep a count. On the other hand in proprietary land any market where MS has a presence has only MS things to offer. E.g., it’s either Office or some Open Source thing. We’re not against proprietary software but why should a single entity be the only choice consumers have? That’s obviously wrong and we intend to put MS back in its place; to do that, we have to write great code and we’re working hard towards that goal.

    However, there’s a big problem: the consumers cannot vote with their money. When they buy a PC it has Win preinstalled and they can’t refuse it. Actually because of that most ppl see Windows as part of the PC so they won’t even understand what having a choice means nor they will ever question the validity of the OS they run (i.e., windows). That’s a situation that must be eliminated or MS will end up having total control of everything, buying whatever tries to compete. The idea of a single company controlling the OS, the media market, the hosting business etc etc frightens me.

    "OK, so, I fail to see how Microsoft could be more open about its new file formats, or how its royalty free license could be any better (at least for businesses using MS products who want to add solutions on top – anyway)."

    The license over those formats is designed to exclude any OSS competitor. OS is the only competitor MS has. Got it? If you want I can make a sketch.

    "The new XML file formats give quantum leap benefits over the previous binary formats, allow for “good” programmers and even hobbyists to easily create some amazing business solutions on top of the XML formats and are huge for reducing business risk around file corruption, data persistence, etc."

    So does ODF. Your point?

    "Anyway – Thanks for the blog, the wealth of information you are providing and for sharing your enthusiasm with us."

    Why I’ll thank Brian too. Thanks for your FUD and say hello to the MS Information Control guys!

  13. Hazz says:

    Craig – thanks for the clarification about the GPL and the breadth of other open licenses available.

    Yuki – thanks for your ‘comments’, it is an interesting statistic (if verifiable) that 99% of programmers code is written for other than software vendors.

    Hazz

    Business Analyst

  14. Darryl Hover says:

    Brian,

    One area where the MS formats really seem to shine is the DataStore portion you demo’d at the PDC. According to http://xml.openoffice.org/faq.html (point #5), this is not possible in the OpenDocument formats.

    This one feature finally makes possible the automated document generation system I’ve been scheming up for my organization for the past 4 years.

    I’d like to know more about this feature. In your presentation, you said "I’ve created a schema for this data set that I’ve just dropped into this file and it’s just sitting there inside of this package and travels inside this file…" but you only showed the actual data file…you never mentioned a schema except for that one brief moment.

    Question 1: Is this feature a take-off of the custom schema feature in Word 2003?

    Question 2: Can multiple data files be used? For example, data from two or more disperate database tables could not be used in a mail-merge. Can this now be accomplished using the DataStore part?

    Question 3: Can you please post a copy of the demo’d document file (SaleOfPropertyPITxt.docx)?

    Thanks

    Darryl

  15. sagi says:

    what about support for WordML to XSL-FO?

  16. BrianJones says:

    Hey Darryl, I’m really excited about the custom defined schema support we have in Office 12. The support in 2003 was a great first step, and I think we’ve definitely built on that momentum. I’ll post more about the ways you can integrate the data with the presentation… to answer your questions though:

    1. There are actually two pieces to this. The first is the support for custom defined XML files to be placed in the ZIP package. I talk about this here: http://blogs.msdn.com/brian_jones/archive/2005/11/04/489223.aspx

    The second piece to that is the ability to map nodes from the custom XML into the surface of the document via content controls. The content controls are based on the same technology we used for the custom schema support in 2003, and I’ll talk more about that in future posts.

    2. You can place as many XML files as you want into the ZIP package. As I discuss in that post I mentioned; we allow you to put your own XML files into the ZIP package and we’ll load those into memory when the document is opened and provide programatic access. There is no direct support for integrating the items in the datastore with the mailmerge functionality though. You could definitely do this programatically if you wanted. Another alternative is to use the content controls and map them to the different nodes in the seperate data tables.

    3. I’ll try to get a copy up quickly after Beta 1 is released (I want to use the most up to date build).

    Sagi, we do not support XSL-FO directly, but there are a number of tools out there that have this support. I’ve talked about this in previous blog entries:

    http://blogs.msdn.com/brian_jones/archive/2005/10/20/483161.aspx

    http://blogs.msdn.com/brian_jones/archive/2005/08/11/450539.aspx

    -Brian

  17. Rodrigo says:

    I watched some webcasts about VSTO and Word/Excel and it looks great.

    Unfortunately I am not finding much stuff regarding PowerPoint support, so I wonder whether that will be also supported. What I need to do is to create a ppt file in the server side (aspx) and create some charts in it, pulling the data out from a dataset.

    Would that be possible or am I stuck with Office automation for a few years more?

  18. BrianJones says:

    Hey Rodrigo, PowerPoint is also moving to a new XML file format. So you can just use standard ZIP and XML to parse and generate powerpoint presentations.

    You can get a preview of the schemas here: http://www.microsoft.com/downloads/details.aspx?FamilyID=15805380-f2c0-4b80-9ad1-2cb0c300aef9&displaylang=en

    There are two download choices for you. You can get them packaged as an msi, or as a zip file.

    -Brian

  19. I’ve had a number of folks who attended my session at the XML 2005 conference ask me if the deck was…

Skip to main content