Kirk Evans Blog

.NET From a Markup Perspective

Deep XML Geekery: XPath and not()

Teemu asked a very cool question on the ASP.NET message board today. Given the input XML document:

<book id =”1 >
 <name>XML And ASP.NET</name>
<book id =”2 >
 <name>XML For ASP.NET Developers</name>
<book id =”15 >
 <name>ADO.NET And XML: ASP.NET On The Edge</name>
<book id =”4 >
 <name>C# Developers Guide to ASP.NET, XML, And ADO.NET</name>

The following XPath returns “15”.  Why?

/books/book/@id[not(. < /books/book/@id)]

To explain, it might be easier to start with a simple node test and then work forwards. 


This asks the processor to select the “book” elements with an attribute “id” having a value of “2”.  This is obvious to most folks who have worked with XPath even a little bit.  This is akin to the SQL “IN” predicate for subqueries.


This asks the processor to look for the book element with an attribute “id” having a value of 2, then select the nodeset that *does not* contiain it.  This grabs all of the book elements in the example except the one with id=2.  The not() grabs the nodeset that does not contain nodes meeting this condition.  This is akin to the SQL “NOT IN” predicate for subqueries.

/books/book[@id &lt; /books/book/@id]

This asks the processor to look for books that have an attribute value that is less than *some other* node in the condition.  The id’s selected in this one are 1,2, and 4, because they are all less than 15. 

/books/book[not(@id &lt; /books/book/@id)]

This asks for the attribute that is less than some other attribute in the node set, and the not() says to only select those nodes not meeting that condition.  It effectively asks the processor not to select the books with an id of 1,2, or 4, meaning it only selects number 15.

If there were 2 book nodes with an id value of 15, both would be selected.  You could grab just the first one.

/books/book[not(@id &lt; /books/book/@id)][1]

The problem with using this type of approach is that a large number of nodes must first be selected to find the test condition, then those nodes selected again to find the nodes that are not within that condition.  This breaks down for large documents because of the number of nodes that have to be processed, especially if this type of condition is used in a loop or condition to another node test.

The recursive max() approach only fares a little better for large documents in that the working set is smaller per stack frame, but the overall stack size is much larger:  significantly large documents can cause stack overflow. 

Dimitre’s FXSL solves this problem through much more effective node set processing, which is also much more complicated.  That is the beauty of using his approach:  he did the hard work for you, you just need to call into this template rules.

Side Note

The XPath that we built has other applications as well.  For instance, finding distinct nodes can be processed using this preceding-sibling axis, as Teemu showed in his article

/books/book[not(@id = preceding-sibling::book/@id)]

Recognition of how expensive this processing pattern can be is what led Steve Muench to develop the Muenchian grouping pattern, using keys for distinct and grouping rather than the preceding axis.