Here’s a post from Eric White where he provides some code samples for using XLinq to parse a WordprocessingML document: http://blogs.msdn.com/ericwhite/archive/2006/08/01/685535.aspx
Here’s the description of what Eric was trying to accomplish:
“Recently, I had a problem where there wasn’t a code testing harness that would do exactly what I wanted. I want to grab my code snippet directly from my word document, compile it, run it, and validate the output.
In more technical terms, I want to parse some WordML to grab text formatted with a given style. Further, I want to put a comment on the first line of the formatted text, and be able to grab the comment. The comment will contain the metadata that tells how to compile and run the code.
My word docs are stored in WordML (which is XML). My experiment was to see how easy it would be to pick apart the WordML using XLinq. This is the result.
First, I needed to see what the WordML looked like. If you open a WordML file, it is saved without any indenting, making it difficult to see the element tags, and the structure of the document. So I used the following program to indent the file: …”