Friday thoughts

I just had a couple random things to mention this afternoon:

  1. ARCast – Office 2007 Open XML Format (Part 1 of 2) - Doug Mahugh and I did a live webcast last month with Ron Jacobs (https://www.skyscrapr.net/blogs/arcasts/archive/2006/10/18/370.aspx). Ron just recently posted an audio recording of the first half of the discussion. Not sure when he'll post the second half…
  2. Convert SpreadsheetML into generic XML - There's a new article up on openxmldeveloper.org that shows how you can convert spreadsheetML into generic XML using an XSLT, and then bind that xml data source to an ASP.NET data grid (which you can then display in the browser).
  3. Performance of XML file formats - I saw these two posts from IBM's Rob Wier discussing different issues around XML file format performance (Celerity of verbosity and Why is OOXML slow?). I love the fox news style of that second post's title BTW ("Democrats want to destroy your family?")… just kidding Rob <g/>. I have to admit that the first post and the second seem a bit contradictory. The first post says that the size of the XML file doesn't affect parse times; and then the second one says that wordprocessingML is slower to parse than ODF because of the larger file size. I admit I haven't had a chance to drill deeper, so I'm sure there is more to it than that (Rob's a performance architect, so I doubt he would miss that).
    I had a post a number of months ago around tag size being an issue in XML parsing times, and I still hold to that. I'll actually try to pull together some numbers to back that up as it sounds like Rob disagrees. There are of course a number of other factors that play a much more important role in the structure of the file format besides tag size (I even mentioned in my original post that tag size itself was a small factor, but still significant enough that we made the decision to use terse tag names on any structure that is likely to repeat often throughout the file). The other issue is that in Rob's experiments he is focusing on WordprocessingML rather than SpreadsheetML. Spreadsheets really are the bigger item in this discussion, as they can have hundreds of millions of XML tags in a single file. In a large wordprocessing document, it's really the text content itself that makes up a lot of the file, and there aren't nearly as many XML tags.
  4. History of richedit - Murray Sargent has a couple great posts discussing richedit and his history with the team (he's been working on it since 1994). The first post discusses the different versions of richedit (https://blogs.msdn.com/murrays/archive/2006/10/14/richedit-versions.aspx). The second post goes into more of the history behind the project and talks about how 3rd parties can leverage it (https://blogs.msdn.com/murrays/archive/2006/10/20/some-richedit-history.aspx).
  5. Standard at Microsoft - Jason Matusow had a couple interesting posts this week. The first posts gives more information on the OSP which is a new approach we recently took towards making various formats freely available to developers (https://blogs.msdn.com/jasonmatusow/archive/2006/10/18/application-of-the-osp.aspx) . The second discussed some of his thinking around interoperability based on his latest trip out to Brussels (https://blogs.msdn.com/jasonmatusow/archive/2006/10/18/interoperability-in-europe.aspx). There are really some important points here around what interoperability really means, and what the most effective ways of building interoperable applications. Custom defined schema support for example is extremely important for allowing Office documents to easily interact with backend data.
  6. Calorie burning drink from coca-cola -Not to sound completely random, but I just saw this and it instantly brought back memories of that old Jim Carrey SNL bit about a hard core diet approach. "Ride the Snake"

Have a great weekend everyone.

-Brian