Article on TechWeb discussing the new XML formats


TechWeb: XML From Office? Microsoft’s Open Promise


http://www.intelligententerprise.com/channels/content_management/showArticle.jhtml?articleID=165700225#_


I thought Doug made some good points in his article, but there was still come skepticism at times. I guess I didn’t make it clear, but we do completely conform to the W3C XML 1.0 standard, so any parser that supports that should be able to work with our files. If anyone has found that our XML files don’t work with a particular parser, please let me know.


Kurt Cagel’s comments are worth drilling into a bit more. It’s true that we’ve been evolving our XML support in Office for a long time now (since Office 2000 as he mentions). The big shift this time is that we are now saying that the formats will be full fidelity and the default for Word, Excel, and PowerPoint. That’s a huge step forward. The comments on the complexity of the XML output really has to do with the feature set in the applications more than anything else. Word, PPT, and Excel have a ton of functionality. We have to support all that in XML. We can’t have the conversion from the old binary formats into the new XML formats result in any kind of feature loss. It needs to be 100% full fidelity. That means the schemas themselves will be pretty large. That doesn’t mean all files have to be extremely complex though. If you don’t care about most of the functionality and just want to create a simple file, you can do that. I posted last month an example of doing just that: Intro to Word XML Part 1- Simple Word Document


-Brian

Comments (3)

  1. Slashdot: MS Office XML Format Now in TextEdit

    I saw this the other day on slashdot. I have to admit…

  2. Good Evening Brian,

    I’ll have to concur with what you’ve said about the complexity of the binary formats given the feature set of Microsoft Office, (Specifically Excel).

    For the better part of a year, I’ve been developing a Spreadsheet API… populate data "virtually" against the API, and choose the "driver" at runtime (whether it be the Excel Binary format, or SpreadsheetML) to create the file contents and write out the file.

    Thus far I’ve become very familiar with both formats, and SpreadsheetML (although not perfect) is a huge step forward.

    Currently the SpreadsheetML file sizes tend to be larger due to the verbosity of XML, however, the amount of time and memory required to generate the SpreadsheetML file verses the Excel Binary file has been decreased by an order of magnitude in most cases.

    (BTW I "zipped" a few SpreadsheetML files that contained the same content as the binary equivalents, and I am seeing consitent results as to the sizes as you had discussed in earlier posts/interviews (Zipped, the SpreadsheetML file is approximately 1/4th the size of the Binary file).

    I had read somewhere that the new Office 12 Open XML file formats will be "extensions" on the XML (WordML, SpreadsheetML formats.) If this is true, how much relative "changing" verses "extending" is being done? (if you don’t mind me asking)

    Either way, I’m pleased with what I am able to produce so far. If I did have a "wish list" of "changes" I would like to see, (nothing major) as far as the structure and content of the XML files (given the complexity surrounding developing a programmatic solution for SpreadsheetML) here’s the list:

    1) Consitency – sometimes the representation for "ON" or "OFF" in the XML is represented as the attribute of an element with value "0" or "1"

    ss:AutoFitWidth="0" //OFF

    ss:Italic="1" //ON

    sometimes the "presence" of an element, means an option is turned "ON", the absence of it means it is turned off

    <AllowPNG/> //ON

    2) It’d be nice to have a way of either specifying the metrics or keeping them all consistent units/scales(i.e. everything in pixels), rather than having a different metric for, say the location of the split when a split pane is defined (And please, don’t resurrect "twips" 🙂

    3) Elements like Page Header which are defined as:

    Header x:Data="&LLeft Header&CCenter Header&RRight Header"

    …Would be great if the components of the header were elements or individual attributes as apposed to "amalgomated" attributes

    4) I’m not a huge fan of the R[x]C[y] notation of refernces, (but that’s just me). Especially when you are referencing a different sheet and you have to do this in a formula:

    ‘Some Sheet Down the Road’!R[-8]C[3]

    It would be awesome (for me of course) if references were represented as a notation like:

    REF(2,5,19) which stood for Sheet 2, Row 5 Column 19 for absolute, and ^REF(2,5,9) for relative position… Ranges could be defined as REF(1,2,3)->REF(1,2,30), and if referring to a cell on the same sheet it’d be nice as REF(1,2) and ^REF(1,2).

    5) Custom Data Formats are sometimes Conditional Formats (without all the additional formatting options), however these formats have a different notation in the XML file and exist in a different section of the document. It would be nice that the data format specified by the string "_($#,##0.00_);[Red]($#,##0.00)" would be broken out using the conditional format notation. Maybe conditional formats become part and parcel of the Style element, and therefore easier to develop against and manage.

    Ideally it would be wonderful if all Style related features … "GridlineColor", ConditionalFormat "Formatting", "TabColor" would be defined in the Styles section of the XML (which would be like a CSS in an HTML page). Maybe the Styles section could have

    Styles

    CellStyles

    CellStyle

    ConditionalStyles

    ConditionalStyle

    Condition

    CellStyle id=?

    Condition

    CellStyle id=?

    ConditionalStyle

    Condition

    CellStyle id=?

    Condition

    CellStyle id=?

    GridlineStyles

    GridlineStyle

    TabStyles

    TabStyle

    HeaderStyles

    … you get the idea.

    6)And the last thing I can think of is how "AutoFiltering" is handled through "Named Cells", as well as specified in the Names section…

    NamedRange ss:Name="_FilterDatabase" ss:RefersTo="=Sheet1!R6C2:R57C6"

    ss:Hidden="1"

    It would be nice if one could simply define the autofiltering (as above) without having to label all of the individual cells as Named Cells as well.

    Anyways, please don’t take offense to this post, I’ve already coded a solution to account for these things, I just figured I’d let you know what I found to be minor "pains" when developing solutions. (If the new XML File formats are still "in flux"). Again the SpreadsheetML XML formats are really a huge improvement as they stand.(I’m just being nit-picky)

    Thanks for your great work. And please feel free to contact me if I can provide any clarity or assistance.

    Best,

    M. Eric DeFazio

    eric@workbeans.com

  3. BrianJones says:

    Eric, thanks a ton for taking the time to provide your comments. This is definitely the type of feedback I’m looking for. While I can’t promise we’ll be able to come through on all your requests, I’ll definitely look into them.

    There will be a lot more information about the new Spreadsheet schema after PDC. I’ll be interested to hear your impressions.

    -Brian