Word Blog HTML Quality


Every post you make as a blogger can be a huge learning experience. In Friday’s post on the new Word 2007 blog post authoring feature I made a fairly modest claim that the HTML emitted by the feature would be better than the standard HTML from previous versions of Word. Well, it is, but I should have looked at the source code before I told everyone else to do so. There were a few problems:




  • Blog service vagaries


There were a number of issues introduced by Community Server (the blog system on which blogs.msdn.com is built). The upper case tags are the major example of that problem. Also, a number of issues were pointed out around the template for the site. The template is one of several standard CS template and, of course, had nothing to do with the HTML emitted by the blog feature (ID attributes with strange values are an example of this).




  • My stupid HTML mistakes


As I stated, I hand coded the image tags and I made a couple stupid HTML coding errors. Luckily I will not be shipped in the box with the feature and our developers will output it correctly.




  • Real problems we need to address


Also, several people made great suggestions for improvements that we want to look into. An example of this is needing to use <del> for strikethrough and the need to have proper tag content flow.



Goals



Most importantly I’d like to lay out the goals for the HTML output.




  • We will hand off valid XHTML for each post


We can’t be held responsible for what Blogger, Spaces or anyone else does to the XHTML after we give it to them, but we’ll send it to them as valid XHTML.




  • Clean HTML is more important than visual fidelity


This is a huge change for Word. Our focus has always been ensuring (as much as possible) that the HTML we output would result in full round trip of all the content and formatting in your document. The blog feature is all about representing what we can in a clean way without any special action/decision point on the part of the post author.



Suggestions welcome



With these two goals in mind, I would like to announce that we will post our the details of our XHTML output for public comment. The manner in which we do this (blog, discussion list, wiki, or something else) will be announced early next week. We can’t promise that we will respond to all suggestions, but we will seriously consider them.

Comments (17)

  1. Sahil Malik says:

    Joe –

    In my opinion you should make this blogging work with Sharepoint first. Sort out all issues there, and then publish a simple API for others to adopt.

    This would be a great opportunity to establish a SOAP friendly Web service API setup, that serves as a standard for blogs.

    Sahil

  2. Step says:

    Gotta love the openness!  

    I don’t know enough to have picked apart your code, as you invited, but I look forward to using the final product once you’ve got all kinds of excellent feedback from the community!

  3. Mario Goebbels says:

    Community Server still messes around with the HTML even if you hand it over using the webservice APIs? I know that FreeTextBox mangles it until you don’t recognize it anymore, but the webservices? That’s weak.

  4. Jfriend says:

    It may well be Community Server’s wysiwyg editor that causes the problem. There is currently a bug in the Word blog editor that forces me to open up the post on the server and get the post time set correctly.

  5. Jfriend says:

    It may well be Community Server’s wysiwyg editor that causes the problem. There is currently a bug in the Word blog editor that forces me to open up the post on the server and get the post time set correctly.

  6. Joe Friend, the guy who started the Blogging from Word 2007 whirlwind, posts a follow-up…

  7. Peter Sefton says:

    In comments on your last I added a suggestion to consider using styles to drive the HTML export, I take your point about simplicity over formatting fidelity, but with a good, predicatable set of styles you can do a lot.

    http://ptsefton.com/blog/2006/05/13/beyond_blogging:_style-driven__html_export_from_2007._please.

    I’d love to know what you think of this idea. Are styles still there/usable in   the new Word?

  8. Sam Sethi says:

    This is good news but which version(s) of XHTML will Word render – 1.0 transitional, 1.1 strict or 2.0? Creating Microformats will be a doddle now.  I could write a template that non-techies could populate to generate hcard, hreview etc.  

    I for one think this is the strongest reason to upgrade my version of Outlook and Word so long as the metaweblog API and Atom publication support allow me to  blog to a variety of blogging tools and not just Spaces.

    I wonder if Microsoft could go further and fully support CSS2 for template formatting in Word and JavaScript 1.5+ for macros.

  9. Ron says:

    I’d just like to say, I was worried (understatement) that Word would try to output any kind of HTML at all. After reading your blog Joe, I’m feeling much better about MS and the people working there.

    I wish there was more people like Joe working for MS. 😉

    If you can do what you say, then don’t stop saying it.

  10. One of the features in Word 2007 Beta 2 is the ability to author blog posts. Joe Friend announced the…

  11. I was at Microsoft’s office’s in Mountain View last Friday; the guest of Don Campbell, Microsoft’s Office 2007 Evangelist. I have to say that I was really impressed with what I saw. People are counting Microsoft out, but that’s crazy….

  12. Joe Friend, the guy who started the Blogging from Word 2007 whirlwind, posts a follow-up on technical