LINQ Farm: LINQ to XML and Line Numbers


There are times when it is useful to know the line number of a node in an XML file. This information can be a helpful to users, particularly if you want to report an error. It can also be convenient to search for a node by line number, but that can, of course, be a very risky endeavor, as documents can be modified accidentally, and their line numbers changed without notice.

This post shows a few fundamentals about working with line numbers in a LINQ to XML program. The code shown in this post is taken from a project called XmlLineNumber. You can download this program from the LINQ Farm on Code Gallery.

Reporting a Line Number

Let’s begin our exploration by detailing a technique for reporting the number of a node that you have found in an XML file. To get started we need to use code from a class called XObject. As shown in Figure 1, XObject sits at the top of the LINQ to XML class hierarchy.

Chapter13-XmlHierarchy

Figure 1: The core objects in the LINQ to XML class hierarchy

XObject implements an interface called IXmlLineInfo:

public interfaceIXmlLineInfo
{
    int LineNumber { get; }
    int LinePosition { get; }
    bool HasLineInfo();
}

The eponymous LineNumber property of this interface is able to store the information we want. To enlist it in our service we need only call XDocument.Load with LoadOptions.SetLineInfo:

XDocument xml = XDocument.Load(fileName, LoadOptions.SetLineInfo);

If you load this XML file into memory using SetLineInfo from the LoadOptions enumeration, then line numbers will be associated with the nodes in your document. The file we are loading is called FirstFourPlanets.xml. It’s a sweet little file that looks like this:

<?xmlversion="1.0" encoding="utf-8"?>

<
Planets>

  <
Planet>

    <
Name>Mercury</Name>

    <
Moons/>

  </
Planet>

  <
Planet>

    <
Name>Venus</Name>

    <
Moons/>

  </
Planet>

  <
Planet>

    <
Name>Earth</Name>

    <
Moons> <Moon>

        <
Name>Moon</Name>

        <
OrbitalPeriod UnitsOfMeasure="days">27.321582</OrbitalPeriod>

      </
Moon>

    </
Moons>

  </
Planet>

  <
Planet>

    <
Name>Mars</Name>

    <
Moons>

      <
Moon>

        <
Name>Phobos</Name>

        <
OrbitalPeriod UnitsOfMeasure="days">0.318</OrbitalPeriod>

      </
Moon>

      <
Moon>

        <
Name>Deimos</Name>

        <
OrbitalPeriod UnitsOfMeasure="days">1.26244</OrbitalPeriod>

      </
Moon>

    </
Moons>

  </
Planet>

</
Planets>

Here is code that uses the IXmlLineInfo interface to report the line number of a node discovered through a standard LINQ to XML search:

XText phobos = (from x in xml.DescendantNodes().OfType<XText>()
                where x.Value == "Phobos"
                select x).Single();

var lineInfo = (IXmlLineInfo)phobos;
Console.WriteLine("{0} appears on line {1}", phobos, lineInfo.LineNumber);

This code looks through all the descendants of the root node for nodes of type XText which are equal to the word Phobos. It uses the LINQ query operator Single to ensure that the query returns only a single node. If the query returned more than one result, the call to Single would raise an exception, which in this case is the behavior we want. The program then casts the result as an instance of IXmlLineInfo, and reports the line number to the user:

Phobos appears on line 24

Searching by Line Number

Let’s now turn things around and show how to search through an XML file and look for a node by line number. If you glance at the FirstFourPlanets.xml file, you will see that line 21 looks like this:

<Name>Mars</Name>

Here is code from the XmlLineNumbers sample showing how to search for that node by line number:

XDocument xml = XDocument.Load(fileName, LoadOptions.SetLineInfo);

var line = from x in xml.Descendants()
           let lineInfo = (IXmlLineInfo)x
           where lineInfo.LineNumber == 21
           select x;

foreach (var item in line)
{
    Console.WriteLine(item);
}

Note that the first line uses LoadOptions.SetLineInfo to ensures that line information is recorded when the document is loaded into memory.

The LINQ query shown here uses Descendants to iterate over the elements in the FirstFourPlanets.xml file. The where filter in the query checks to see if any of those elements has its line number set to 21. It happens that the 15th element returned by the call to Descendants fits that search criteria, and so that node, and that node alone, is found when we foreach over the results.

Notice the cast to convert the XElement nodes returned by the call to Descendants:

let lineInfo = (IXmlLineInfo)x

This cast is necessary, since the actual fields of the IXmlLineInfo interface are not exposed by XElement.

Once again, I want to stress that reporting the line number of a node seems like a reasonable thing to do, but searching for an element by line number is usually not a good idea in production code. For unexplained reasons, code that was on line 532 has a way of migrating to line 533 when you least expect it. In any case, you now know enough to begin working with line numbers in a LINQ to XML program.

Download the source.

kick it on DotNetKicks.com

Comments (8)

  1. You’ve been kicked (a good thing) – Trackback from DotNetKicks.com

  2. ravenex says:

    Hi,

    Thanks for the article. I’d like to ask though, can a XDocument be saved back to a file in the exact line numbers as it was loaded? There doesn’t seem to be such an option in SaveOptions

  3. In a previous post , you saw how to work with line numbers when using LINQ to XML to read a file. This

  4. ccalvert says:

    Ravenex,

    Thanks for your question. It is interesting. Does the Preserving Formats post linked too above help, or were you after something else?

    – Charlie

  5. ravenex says:

    Charlie,

    The new post does give some insight into how it might be done. My goal was to be able to do least amount of modifications to an XML source file when editing with an editor I wrote. The editor shows the tree hierarchy of XML document, and users can’t see the actual text in the source file, so line numbers didn’t really matter. It would have just worked, but when the files are checked into repository, reformatting the source files will cause versioning control report more modifications then there need to be. That’s why I wanted to do the least amount of modifications when editing, and perserving line numbersis crucial.

    Thanks,

    – Ravenex

  6. SkipSailors says:

    I have:

    document.Validate(schemas, (o, e) =>

     {

       Console.WriteLine("{0} at {1}",

         e.Message, e.Exception.LineNumber);

       error = true;

     },

    true);

    reporting errors always at line 0.  I use the LoadOptions.SetLineInfo. Does this have anything to do with what you are talking about here?

  7. OpticTygre says:

    Hi Charlie,

    I’m actually working with line numbering in VB, attempting to perform more advanced things.

    The IXmlLineInfo interface provides a good option for discovering the line number and line position for the start of an element, node, or text.  Is there a way, however to get the ending line number of an element and text?

    Through my endeavors of answering that question, I have discovered some interesting things in Linq to XML.  If you’re interested, please read my forum comments at http://social.msdn.microsoft.com/Forums/en-US/vbgeneral/thread/a485f9b7-58a3-4da3-8654-6c1b53e7f5ba.

    Thanks for any advice you can provide!

  8. OpticTygre says:

    To further explain my comment above, I believe a brief example should be provided.  Take the following XML:

    <A>

     <B>

       <C>Some Text</C>

       <D>

         <![CDATA[

           This is some other data

           that may take several lines

         ]]>

       </D>

     </B>

    </A>

    In this XML, all we know from IXmlLineInfo is the starting line number of each element, and the starting position of each element.  We don’t know:

    1) The ending line number of each element.

    2) The ending line number of text or CData

    3) The ending line position of each element (though this should be the same as the start position)

    4) The ending line position of text or CData.

    I believe the heart of the problem lies in the fact that even though the NodeType property of an XNode is of type XMLNodeType, the XNode object does not use all of the node types available.  Looping through this XNodes of this XML, and printing out the NodeTypes, you will notice that and XNode treats <A> and </A> both as Elements, and not an Element and EndElement.

    -Jason