Intro to Word XML Part 3: Using Your Own Schema


[This post has been removed due to legal concerns]

Comments (16)

  1. Eugen Bacic says:

    Thanks, Brian! Now you have me waiting for how to capture events on those elements.

  2. Dave R says:

    Brian,

    Sorry in advance, probably not the right place for this comment…

    We’ve been using some other formal XML Editors for content creation. I’m dying to know if Office 12 will build on top of Avalon and compete with the high-end XML editors on the market.

  3. Ignace says:

    I think it’s ALT+F11 for VBE instead of CTRL+F11

  4. BrianJones says:

    Ignace, you’re right. Thanks for the correction. I just updated it to say "ALT + F11"

    Dave, what are the scenarios you are interested in. Office 11 already has a good amount of XML support (between Word, InfoPath, FrontPage, and Excel). In Office 12 we continue to make improvements, which we’ll talk about more at PDC. It’s hard to answer your question though in regards to competition since editing XML is a fairly basic, generic thing. It really comes down to what type of XML you are editing and what the scenario is.

    -Brian

  5. Dave R says:

    Brian, ah, you’re not a mind-reader {:>}. Sorry to be so vague; I’m working in the Publishing sector – books, manuals, newsletters, etc. I’d like my authors to be able to create structured content (XML) in Word and also be able to create "a" style and layout on top of that or apply a predefined style (probably can do most of this in WordML). I’d also like them to be able to work on say a chapter, a primarily "virtual" xml doc that would allow them to link content from other xml docs e.g. letters, graphics, articles, etc. stored in a common repository. Then of course they might open up a book and add chapter links etc. What do you think?

  6. David Giusto says:

    Dave – This will address your question

    Brian – This will address PtSetton’s comments to your post on 7/8, Word XSLT: Data Only Transform

    We at Document Management Solutions Inc. (DMSI) have integrated Word with an XML content management system. A CMS for document publishing – not a web CMS. The methodology that we used is just what Brian describes in his reply to Bryan White on 7/11 here:

    http://blogs.msdn.com/brian_jones/archive/2005/07/08/436973.aspx

    Brian’s Quote:

    "We designed the XML support so that you could leverage both

    WordML and your XML together. If there are features such as

    formatting, lists, and tables that Word already supports,

    then you don’t need to mark that up. Instead you can just take

    the subset of your schema that isn’t already represented by

    Word functionality, and only mark up with that."

    We only use the user defined schema for high level structure and for application specific data such as anchors and targets. We can then manage the WordML chunks at a higher level of abstraction. This allows for all the CMS functions that you are familiar with to be applied to a Word editorial environment. These functions include:

    Check-Out & Check-In from Microsoft® Word

    Version Control & Change Tracking

    Document Component Sharing & Reuse

    Fragment editing & Concurrent Authoring

    Since we are using Word we also get WYSIWYG XML Editing & Page Composition. Two things you can not get with a traditional XML authoring tool.

    To Brian’s point, we got it a long time ago and once you start thinking about Word and XML properly for the context it is quite powerful. Yes, there are a lot of warts in Word XML particularly around how lists are handled but you MUST understand that Word XML is a relational database not a traditional document XML hierarchy. Let me say that again and then you should think about it – Word XML is a relational database not a traditional document XML hierarchy. If you don’t understand this point you will not succeed in employing Word XML in any reasonably complex solution. If you find yourself puzzled over this point just look at w:listPr w:list and w:ilfo and say primary key. The other thing to look at is the w:p. A Word document is a series of non-nested paragraphs. It is as if you ran a SQL query and got back a list of paragraph rows where the columns include style name and content. If you still don’t get it you are probably over you head here.

    While full featured round-trip conversion between the two XML formats (i.e. database and hierarchy) is technically possible it is by no means practical for a Word implementation.

    For the skeptics – If you want a demo contact us at http://www.dmsi-world.com

  7. David Giusto says:

    I guess I need new glasses – In my previous post I got both reference names spelled wrong.

    My apologies to Peter Sefton and Bryan Wilhite.

  8. Alexander Ryan says:

    Great example.

    However, whenever I save re-save my file in Word, it chooses to rename my namespace prefixes. This causes the program that processes my file to break as the XPath has now become invalid!

    Is there any way to prevent Word from doing this?

  9. BrianJones says:

    Hey Alexander, the quick answer is no, you cannot control the prefix we use to write out the files. If it’s really important to you though, you could always save through an XSLT that takes everything in a particular namespace and forces it to use the prefix you want.

    More importantly though, you should never rely on a prefix. When you’re programming against the files, you should use the namespace to build up your XPaths, not the prefix. Prefixes are able to change without effecting the actual meaning of the file at all. It’s just a shorthand for the actual namespace.

    Let me know if that helps and you’re able to get it working ok?

    -Brian

  10. Alexander Ryan says:

    Brian,

    Thanks for the quick response.

    I’m a bit of a newbie to this world and I’m afraid that I’ve never seen an XPath expression that used namespaces instead of prefixes.

    One of my expressions looks like this …

    /w:wordDocument/w:body/wx:sect/u:designOverviewSection/u:designComponent[@number=’1′]/u:name

    and the namespace looks like this …

    xmlns:u="http://www.unisys.com/schemas/3dve/designView"

    which Word might change to something like this …

    xmlns:ns1="http://www.unisys.com/schemas/3dve/designView"

    By chance are you saying that I have to write my program to first find out what word changed my prefix to and then dynamically revise all of my XPath expressions?

    Or is there some way to use a namespace instead of a prefix in an XPath expression itself.

    -Brian

  11. BrianJones says:

    How are you using this XPath? Is it in an XSLT, or through the XML DOM, or some other way?

  12. Alexander Ryan says:

    XPath is being used in an external XML processing application which uses DOM (dom4j). The XPath expressions are hard-coded into this program.

    I believe that I will have to modify this program to dynamically determine what word has changed the namespace prefix to & then re-generate the XPath expressions accordingly.

    –Alex

  13. BrianJones says:

    Have you tried using the SetProperty method to set your SelectionNamespaces. You should be able to use that for the DOM to specify what you want your prefix to map to. Then it won’t matter what prefix is used in the actual XML file, it will only matter what namespace that prefix is mapped to. For example:

    oXML.setProperty("SelectionNamespaces", "xmlns:my=’myNamespace’")

    or something like that…

    This is really the right way for dealing with XML. You should never view the prefix as having any meaning on it’s own. It’s always the namespace you should be programming against.

    -Brian

  14. Alexander Ryan says:

    Brian,

    I think that I did not communicate the nature of the problem correctly.

    I am using an external program that moves content from one Word document into another Word document. It uses an input file that contains an XPath expression to locate the content in the source document and another XPath expression to pinpoint the location to which the content is to be pasted in the target document. These XPath expressions are actually written into an external XML file and used as input to the process and they "must" use the namespace prefix.

    When word chooses to rename the namespace prefixes that I have chosen to use it breaks my program. There is no way for me to dynamically update the XPath expressions in my non-WordML input file.

    I’d like to suggest that you at least consider reserving some namespace prefixes for use by programmers and not dynamically change these whenever Word documents are saved.

  15. Developers Interest says:

    Hi Brian,

    Thanks for the valuable information.

    I have a few queries here.

    Suppose I need to keep a track of the changes made to a Word document by a user. For doing this say i add tags like </s:employee> to all the paragraphs in the document so that the paragraphs can be identified with the tags. But these tags can be removed by the user at any point of time. Is there some other concrete way to do the same?

    Also, I came across ids like wsp:rsidRDefault, wsp:rsidR and wsp:rsidP associated with paragraphs which seem to change with every change made to the corresponding paragraph. On what basis do these Ids change?

  16. Denise says:

    When I save as data only, my child elements (of the

    root) all have blank/null namespaces xmlns=""…what is the reasoning

    with this and is there a way to have my namespace used in all elements?

    <Incident_Report xmlns="http://www.disa.mil/DISA-PAC-PNC-IR"&gt;

     <Report_Classification xmlns="" />

    Thanks,

    Denise