We put an article up back in the winter on transforming from WordprocessingML into XSL-FO. From there, you can go into other formats like PDF. Not sure if you guys have already seen this, but if not you should check it out: http://msdn.microsoft.com/office/understanding/word/codesamples/default.aspx?pull=/library/en-us/odc_wd2003_ta/html/officewordwordmltoxsl-fo.asp
Moving into formats that are fixed formats are pretty difficult because if you really want full fidelity you need to be able to also understand Word's layout functionality. Fixed formats are formats that describe how the text and information is laid out on a page. PDF and XPS both have examples of fixed formats. The Word format is a flow based format. If you add a paragraph somewhere in the WordprocessingML, then when you open the file back up, the page layout will of course be different (everything after that paragraph just got shoved down). This of course means that we don't store information like page breaks in the format. If we did, and we enforced it, then it would be significantly more difficult to work with the files as you'd have to recalculate those things anytime you modified the text.
I love examples like this that show some of the stuff you can do once you get Office documents in XML. I'm trying to gather a list of similar articles we should provide as we go through the Betas and people start working with the new formats. Are these kind of articles useful? Are there other similar articles you'd like to see? At one point we had started to build a similar one showing how to go into DocBook, but I'm not sure what happened with it. I'll see if I can dig it up.