Recursive Approach to Pure Functional Transformations of XML


Writing pure functional transformations a in a recursive style enables us to put together interesting transformations in a very small amount of code.  Using some specific techniques that allow us to write this code very concisely, this approach takes advantage of some perhaps obscure semantics of LINQ to XML.  I’ve used this approach to write some interesting transformations – it has become my favorite way to write transformations of a certain variety.  This post presents a short tutorial in writing these types of transformations.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC

Identity Transform

First, let’s take a look at the identity transform:

using System;
using System.Linq;
using System.Xml.Linq;
 
class Program
{
    static object Transform(XNode node)
    {
        XElement element = node as XElement;
        if (element != null)
        {
            return new XElement(element.Name,
                element.Attributes(),
                element.Nodes().Select(n => Transform(n)));
        }
        return node;
    }
 
    static void Main(string[] args)
    {
        XElement root = XElement.Parse(@”<Root Att1=’1′><Child/></Root>”);
        XElement newRoot = (XElement)Transform(root);
        Console.WriteLine(newRoot);
    }
}
 

In this code, we are taking advantage of the semantics of the XElement constructor that allow us to pass collections of objects to the constructor.  The following code creates a new XElement from an existing XElement:

new XElement(element.Name,
    element.Attributes(),
    element.Nodes().Select(n => Transform(n)));
 

This code, of course, passes two objects to the constructor (in addition to the name) – a collection of attributes from the source element, and a collection of transformed child nodes from the source element.  Note that even though we’re passing collections to the constructor, we’re only passing two objects.  We’re passing the two variables that contain the iterators for the queries.  Those iterators, of course, implement IEnumerable<T>.  There are a number of LINQ to XML methods and constructors that can take content, and if we pass an object that implements IEnumerable of some T, LINQ to XML iterates the enumerable and adds each object in the collection to the newly constructed element.  This is performed recursively.  For detailed semantics of the constructors and methods, see this page in the LINQ to XML documentation.

This is one of the reasons that the signatures of LINQ to XML constructors and methods that take content are defined using object instead of XObject:

public XElement(XName name, params Object[] content);
 

Because the signature is defined using object, we can pass individual XElement, XAttribute and XNode objects to the constructor, and we can pass collections of them.

Cloning vs. Attaching

The identity transform also takes advantage of the cloning vs. attaching semantics of LINQ to XML.  When we pass a newly constructed XElement to the XElement constructor, the new object does not have a parent so it is simply attached to the XElement that is being constructed.  But the highlighted line below causes a node from an existing tree to be passed to the XElement constructor.  In this case, the XElement constructor notices that the node is already part of an existing tree, and clones it and adds the cloned node to the XElement being constructed.

XElement element = node as XElement;
if (element != null)
{
    return new XElement(element.Name,
        element.Attributes(),
        element.Nodes().Select(n => Transform(n)));
}
return node;
 

To make this very clear, take a look at the following example:

XElement tree1 = XElement.Parse(“<Root><Child1/></Root>”);
XElement child1 = tree1.Element(“Child1”);
XElement child2 = new XElement(“Child2”);
XElement tree2 = new XElement(“Root”, child1, child2);
Console.WriteLine(tree2);
 

In this example, child1 already is part of tree1, so it is cloned and the newly cloned element is added to tree2.   The child2 XElement object has no parent so it is simply attached to tree2.

For more information on cloning vs. attaching, see this page in the LINQ to XML documentation.

Note that when an XElement object is cloned by a LINQ to XML constructor or method, of course all attributes and descendant nodes are also cloned.

Removing Nodes from the Newly Cloned Tree

Sometimes when we clone a tree, we want to trim certain elements and attributes and nodes from the newly cloned tree.  We can take advantage of the fact that it’s valid to pass null as one of the arguments to the XElement constructor.  If one of the arguments to the XElement constructor is null, the XElement constructor simply ignores that argument:

XElement xml = new XElement(“Root”,
    new XElement(“Child1”),
    null,
    new XElement(“Child3”));
Console.WriteLine(xml);
 

This produces the following output:

<Root>
  <Child1 />
  <Child3 />
</Root>
 

Let’s say that we want to trim all Child2 elements from the following tree:

<Root>
  <Child1 />
  <Child2 />
  <Child3 />
</Root>
 

We can code the transform like this:

using System;
using System.Linq;
using System.Xml.Linq;
 
class Program
{
    static object Transform(XNode node)
    {
        XElement element = node as XElement;
        if (element != null)
        {
            if (element.Name == “Child2”)
                return null;
 
            return new XElement(element.Name,
                element.Attributes(),
                element.Nodes().Select(n => Transform(n)));
        }
        return node;
    }
 
    static void Main(string[] args)
    {
        XElement root = XElement.Parse(@”<Root><Child1/><Child2/><Child3/></Root>”);
        XElement newRoot = (XElement)Transform(root);
        Console.WriteLine(newRoot);
    }
}
 

These semantics apply regardless of whether null is passed directly to the XElement constructor, or whether null is passed as part of a collection to the XElement constructor.

Such element name comparisons (element.Name == “Child2”) are very efficient if we pre-atomize element and attribute names.  See this post for more information about atomization of element and attribute names.

Note: for simplicity of demonstration, in these examples, I didn’t bother to pre-atomize XNames, but pre-atomization is important if performance is important.

Replace an Element with another Element

A common operation in transforms is to replace some particular element with an entirely new element.  Let’s say that we want to replace the Child2 element with a Child5 element that contains a child element named Text.  We want to transform this:

<Root>
  <Child1 />
  <Child2 />
  <Child3 />
</Root>
 

To this:

<Root>
  <Child1 />
  <Child5>
    <Text />
  </Child5>
  <Child3 />
</Root>
 

We can code it like this:

static object Transform(XNode node)
{
    XElement element = node as XElement;
    if (element != null)
    {
        if (element.Name == “Child2”)
            return new XElement(“Child5”,
                new XElement(“Text”));
 
        return new XElement(element.Name,
            element.Attributes(),
            element.Nodes().Select(n => Transform(n)));
    }
    return node;
}
 

Replace an Element with Multiple Elements

Sometimes we may want to replace a single element with multiple elements.  We can take advantage of the fact that the behavior of the XElement constructor for collections of objects is, as mentioned, recursive.  If one of the objects in the collection is itself a collection of objects then that child collection is itself iterated and all objects are added to the element being constructed.

The following example shows these semantics of the XElement constructor.  The example first creates two lists of XElement objects.  It then creates a list that contains the two lists.  It passes the list of lists to the XElement constructor, and the resulting XElement contains all of the XElement objects in the two lists:

List<XElement> list1 = new List<XElement> {
    new XElement(“Child1”),
    new XElement(“Child2”)
};
List<XElement> list2 = new List<XElement> {
    new XElement(“Child3”),
    new XElement(“Child4”)
};
List<List<XElement>> listOfLists = new List<List<XElement>> { list1, list2 };
XElement newElement = new XElement(“Root”, listOfLists);
Console.WriteLine(newElement);
 

This produces the following output:

<Root>
  <Child1 />
  <Child2 />
  <Child3 />
  <Child4 />
</Root>
 

Let’s say that we want to replace the Child2 element with three Child5 elements.  We want to transform this:

<Root>
  <Child1 />
  <Child2 />
  <Child3 />
</Root>
 

To this:

<Root>
  <Child1 />
  <Child5 />
  <Child5 />
  <Child5 />
  <Child3 />
</Root>
 

We can code it like this:

static object Transform(XNode node)
{
    XElement element = node as XElement;
    if (element != null)
    {
        if (element.Name == “Child2”)
            return new List<XElement> {
                new XElement(“Child5”),
                new XElement(“Child5”),
                new XElement(“Child5”)
            };
 
        return new XElement(element.Name,
            element.Attributes(),
            element.Nodes().Select(n => Transform(n)));
    }
    return node;
}
 

This example is coded a little bit artificially – more typically we would want to replace a particular element with a number of elements that are contained in a collection that is returned by some query.  Let’s say that we wanted to effectively remove a certain element from the tree, and replace that element with its children elements.  We want to transform this:

<Root>
  <Child>
    <GrandChild />
    <GrandChild />
    <GrandChild />
  </Child>
</Root>
 

To this:

<Root>
  <GrandChild />
  <GrandChild />
  <GrandChild />
</Root>
 

We can code it like this:

static object Transform(XNode node)
{
    XElement element = node as XElement;
    if (element != null)
    {
        if (element.Name == “Child”)
            return element.Elements();
 
        return new XElement(element.Name,
            element.Attributes(),
            element.Nodes().Select(n => Transform(n)));
    }
    return node;
}
 
static void Main(string[] args)
{
    XElement root = XElement.Parse(
        @”<Root>
          <Child>

Comments (3)

  1. Phillip says:

    And thatnks for a most informative piece.

    Not sure whether this is the right place to as a question, but will try anyway.

    If I need to "establish" the nodes under an element, how would I accomplish that? I will plagarise some of your data to explain.

    <Root>

     <Child>

       <Apple Id="1" />

       <Carot Id="1" />

       <Banana Id="1" />

     </Child>

    </Root>

    How would I find out the elements under <Child> ?

  2. Hi Phillip,

    If the <Root> element is in an XElement named root, you could query for the children of Child like this:

    root.Element("Child").Elements()

    This would return a collection of XElement objects containing Apple, Carot, and Banana.

    -Eric

  3. I so wish C# supported tailcall!

    It seems they keep adding more functional support to the language, so I’m assuming they’ll add tailcall support at some point…