Non-merging text nodes in XLinq: They’re Baacckk!!

When I described the changes to XLinq in the May CTP, I said:

Note that whereas DOM explicitly allows adjacent text nodes, the XLinq implementation will always merge XText nodes to correspond with the structure of XML text. This has the benefit that developers never need to check for multiple text nodes that contain a single element’s content. However, it does mean that you cannot rely on the identity of text nodes remaining stable because they may be merged into adjacent text nodes as edits are applied to the XLinq tree. … If you must work with text nodes in this CTP version of XLinq, do not re-use them or assume that a reference to a text node will contain the correct data after changes are made to the tree.  Note:  Yes, we know this is inelegant, and this may change in the next preview of XLinq. 

Having gotten some friendly abuse by those suffering from W3C DOM’s non-merging text nodes in the past, I had been very happy with the original XLinq design decision to hide the XText class as an implementation detail.  I resigned myself to partially exposed text nodes in the recent CTP, since they really are a useful way to model mixed content.  After all, the implementation would still merge adjacent text nodes whenever they were created by a change to the tree. But the complaints kept rolling in.   I wasn’t particularly happy when a couple of “Einsteindesigners kept reminding the rest of us of what a Bad Thing it is to not provide a clear contract with the user about the behavior of an instance of a class, and a Worse Thing to violate the Prime Directive of OOP by not giving objects a stable identity.  But since DOM’s non-merging text nodes are something that creates a lot of complaints, have we done the Wrong Thing for the typical user by doing the Right Thing for the purists?

I don’t think so, for several reasons:

  • The parsers do not produce whitespace only text nodes by default, although this option can be enabled for conformance reasons.

  • Also, the InnerText property allows developers to easily get the text value of an element without navigating down to the text node children of an element and iterating over them.

  • The design of the XLinq eventing model (still a work in progress) stumbled over the complexity of XLinq’s contract with the user about the behavior of text nodes.  It’s possible to “magically” merge any adjacent text nodes that pop into existence, but that magic will be exposed by an event model rich enough to support undo/redo logic by the application.

  • After several hours of discussion, we concluded that the target audience for XLinq will not really stumble the fact that text nodes need to be merged in some corner cases. We haven’t come up with a realistic scenario where the Value property (the implementation of which still handles the text node merging) can’t handle the simple cases, and there are a lot bigger challenges facing those who need to get and compare the values of complex elements in a sophisticated way.For example, someone needing to search or compare subtrees of raw text will have to have logic for dealing with CData sections, mixed content markup, etc., whether or not we automagically merge adjacent text nodes.

So what do you think?Is this a step down the road to the DOMness, or acceptance of the reality that automagic text node merging has more cost than benefit?

Comments (1)

  1. XmlTeam says:

    Just to be clear:  This change so that text nodes no longer merge was made after the CTP release.  The CTP behavior is that text nodes are exposed (unlike in the original PDC release), but are automagically merged.  The next version of XLinq you will see in a few months will have the behavior described in this post.

    Mike Champion