Generating an XPath expression to find a LINQ to XML Node

In a number of places in the docs, I present code that finds nodes in the XML tree. Sometimes there are easy ways to describe the results of a query, but sometimes I wanted to describe the results of a query by specifically identifying exactly which nodes are selected by a query. Having a string that specifically identifies a node makes it easy to write sample code that selects specific nodes and then shows the exact results.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCWell, we already have a syntax that allows us to identify a specific node in an XML tree: XPath

Further, there are extension methods in System.Xml.XPath that allow us to evaluate an XPath expression, returning the node(s) that the expression selects.

So, I wrote a method, GetXPath, implemented as an extension method on System.Xml.Linq.XObject, that returns an XPath expression that identifies the node in the XML tree. The implementation is fairly complete - for instance, it generates an XPath expression that contains namespace prefixes when the nodes are in a namespace.

The extension methods in System.Xml.XPath allow us to validate that the XPath expressions that we generate select the exact same node as was used to generate the XPath expression.

I also wrote another useful axis method, DescendantXObjects, which returns an IEnumerable<XObject> that contains all child nodes, and all attributes of any nodes.

Then, I wrote a method, DumpXPaths, that iterates through the descendant XObjects and prints the XPath for every node to the console. This method also validates that the node returned by evaluating the XPath expression is the same node as was used to generate the XPath expression. It also validates that one and only one node is returned when evaluating the XPath expression. For example, the following code creates a simple XML tree, and calls DumpXPaths:

XDocument root = XDocument.Parse(@"<Root AnAttribute='att-value'>
<?xml-stylesheet type='text/xsl' href='hello.xsl'?>
<Child1 AnotherAttribute='abc'>text</Child1>
<!--This is a comment.-->
</Root>");
DumpXPaths(root);

This code produces the following output:

.

/Root

/Root/@AnAttribute

/Root/processing-instruction()

/Root/Child1

/Root/Child1/@AnotherAttribute

/Root/Child1/text()

/Root/comment()

The DumpXPaths method can also take an XmlNamespaceManager, which allows the code that validates the XPath expression to validate expressions that contain namespace prefixes. 

The XPath expressions generated by this method work when evaluating in the context of an XDocument, not an XElement. If you parse into an XElement, the root node is the XElement, but if you parse into an XDocument, the root XElement node is a child of the XDocument. The generated XPath expressions reflect this.

If the GetXPath method returns null, then the method did not generate an XPath expression to select the node. This is, AFAIK, only true for white space text nodes that are children of a document; such nodes are not part of the XPath object model, so it's not possible to generate an XPath expression to select them.

Here is the entire working program to show the XPath expressions for every node in an XML tree. You can get the PurchaseOrders.xml document from the documentation, or you can change the code to dump the nodes for your own XML tree:

using System;

using System.Diagnostics;

using System.Collections;

using System.Collections.Generic;

using System.Text;

using System.Linq;

using System.Xml;

using System.Xml.Linq;

using System.Xml.XPath;

 

namespace LinqToXmlExample

{

    public static class MyExtensions

    {

        private static string GetQName(XElement xe)

        {

   string prefix = xe.GetPrefixOfNamespace(xe.Name.Namespace);

            if (xe.Name.Namespace == XNamespace.Blank || prefix == null)

                return xe.Name.LocalName.ToString();

            else

                return prefix + ":" + xe.Name.LocalName.ToString();

        }

 

        private static string GetQName(XAttribute xa)

        {

            string prefix =

                xa.Parent.GetPrefixOfNamespace(xa.Name.Namespace);

            if (xa.Name.Namespace == XNamespace.Blank || prefix == null)

                return xa.Name.ToString();

            else

                return prefix + ":" + xa.Name.LocalName;

        }

 

        private static string NameWithPredicate(XElement el)

        {

  if (el.Parent != null && el.Parent.Elements(el.Name).Count() != 1)

                return GetQName(el) + "[" +

                    (el.ElementsBeforeSelf(el.Name).Count() + 1) + "]";

            else

                return GetQName(el);

        }

 

        public static string StrCat<T>(this IEnumerable<T> source,

            string separator)

        {

            return source.Aggregate(new StringBuilder(),

                       (sb, i) => sb

                           .Append(i.ToString())

                           .Append(separator),

                       s => s.ToString());

        }

 

        public static string GetXPath(this XObject xobj)

        {

            if (xobj.Parent == null)

            {

                XDocument doc = xobj as XDocument;

                if (doc != null)

                    return ".";

                XElement el = xobj as XElement;

                if (el != null)

                    return "/" + NameWithPredicate(el);

                XText xt = xobj as XText;

                if (xt != null)

                    return null;

                    //

                    //the following doesn't work because the XPath data

                    //model doesn't include white space text nodes that

                    //are children of the document.

                    //

                    //return

                    // "/" +

                    // (

                    // xt

                    // .Document

                    // .Nodes()

   // .OfType<XText>()

                    // .Count() != 1 ?

                    // "text()[" +

                    // (xt

                    // .NodesBeforeSelf()

                    // .OfType<XText>()

                    // .Count() + 1) + "]" :

                    // "text()"

                    // );

                    //

                XComment com = xobj as XComment;

                if (com != null)

                    return

                        "/" +

                        (

                            com

                            .Document

                            .Nodes()

                            .OfType<XComment>()

  .Count() != 1 ?

                            "comment()[" +

                            (com

                            .NodesBeforeSelf()

                            .OfType<XComment>()

                            .Count() + 1) +

                            "]" :

                          &nbsp