Using LINQ to XML Annotations – tracking line numbers


[updated to escape the code so that it displays properly in HTML, and so that it gracefully handles input with an XML declaration]


Several people have asked for a feature in LINQ to XML that would keep track of the line number in an XML data source from which each node was parsed.  We have resisted, partly because there doesn’t seem to be a mainstream use case for this feature, and partly because the minimialist design philosophy behind LINQ: simple, mainstream scenarios should be supported out of the box, whereas more sophisticated use cases can be supported via the various extension mechanisms.  The code example below shows how to use C# 3.0 extension methods and LINQ to XML annotations to do this job. 


The tricky part of this code is the  LoadWithLineInfo method, which sets up the XmlReader; and the LoadNode method, which figures out what the reader returned, constructs the appropriate type of XLinq object to hold that result, handles the line number information annotation, updates the XLinq tree, and calls the reader again.  The good news is once you understand the logic here, you should be able to write other extension methods that preserve information in a source file that does not fit neatly into an XLinq tree.  For example, we believe this basic pattern can be used to extend LINQ to XML to be more DTD-aware, e.g. preserving the entity references or noting attribute values that were set from the DTD default rather than explicitly in the XML source.  This type of information could be stored as annotations by a similar customized load method and referenced by an analogous save method. 


Note: This sample requires the May 2006 LINQ CTP to work. In the May CTP, annotations are only supported on XDocument and XELement objects, but in the next public release of LINQ to XML, it will be possible to attach annotations to almost every type of XLinq object, including attributes and text nodes. 


Try this out; the sample program itself just loads some XML from a string and prints out the line number information where the elements were found.  You might play around with the XML source and the elements whose line number information can be displayed, or you could tweak the program to read from a file specified on the command line.   Let me know what is confusing and I’ll try to clarify.


 

using System;
using System.IO;
using System.Xml;
using System.Xml.XLinq;

namespace System.Xml.XLinq.Extension
{

/// <summary>
/// Sample program to illustrate use of the line number extensions. It reads an
/// XML document from a string, but could be easily modified to open a reader o
/// </summary>
public class Program
{
static void Main(string[] args) {
string markup = @”
<root>
<e a=’value1’/>
<f b=’value2’/>
</root>
“;
XDocument document = new XDocument();
document.LoadWithLineInfo(XmlReader.Create(new StringReader(markup)));
Console.WriteLine(document.Element(“root”).Element(“e”).GetLineInfo());
Console.WriteLine(document.Element(“root”).Element(“f”).GetLineInfo());
}
}
/// <summary>
/// The application-defined class to be attached as an annotation. This particular class
/// keeps track of the line number and character position at which an element was found
/// in the XML source.
/// </summary>
public class LineInfo
{
int number;
int position;

public LineInfo(int number, int position) {
this.number = number;
this.position = position;
}

public int Number {
get { return number; }
}

public int Position {
get { return position; }
}

public override string ToString() {
return “Line #” + number + “, Char #” + position;
}
}
/// <summary>
/// Some extension methods added to the System.Xml.XLinq namespace to support
/// line number annotations.
/// </summary>
public static class Extension
{
public static LineInfo GetLineInfo(this XElement element) {
return element.GetAnnotation<LineInfo>();
}

public static void SetLineInfo(this XElement element, LineInfo lineInfo) {
element.AddAnnotation(lineInfo);
}
/// <summary>
/// A version of the XLinq Load() method that annotates the tree it loads with
/// information on where in the XML file an element was found.
/// </summary>
/// <param name=”document”>An XDocument to populate</param>
/// <param name=”reader”>An XmlReader setup to read from a data source</param>
public static void LoadWithLineInfo(this XDocument document, XmlReader reader) {
if (reader == null) throw new ArgumentNullException();
IXmlLineInfo lineInfo = reader as IXmlLineInfo;
if (lineInfo == null) throw new ArgumentException();
if (reader.ReadState != ReadState.Interactive) {
if (!reader.Read()) return;
}
XNode node = null;
while ((node = LoadNode(reader, lineInfo)) != null) {
document.Add(node);
if (!reader.Read()) return;
}
}
/// <summary>
/// Reads an XLinq node from an XmlReader, annotating it with line number information
/// </summary>
static XNode LoadNode(XmlReader reader, IXmlLineInfo lineInfo) {
XNode node = null;
XElement parent = null;
do {
switch (reader.NodeType) {
case XmlNodeType.Element:
XElement element = new XElement(XName.Get(reader.LocalName, reader.NamespaceURI));
if (reader.MoveToFirstAttribute()) {
do {
element.Add(new XAttribute(XName.Get(reader.LocalName, reader.NamespaceURI), reader.Value));
} while (reader.MoveToNextAttribute());
reader.MoveToElement();
}
if (lineInfo.HasLineInfo()) {
element.SetLineInfo(new LineInfo(lineInfo.LineNumber, lineInfo.LinePosition));
}
if (!reader.IsEmptyElement) {
if (parent != null) {
parent.Add(element);
}
parent = element;
continue;
}
else {
node = element;
}
break;
case XmlNodeType.EndElement:
if (parent == null) return null;
if (parent.IsEmpty) {
parent.Add(string.Empty);
}
if (parent.Parent == null) return parent;
parent = parent.Parent;
continue;
case XmlNodeType.Text:
case XmlNodeType.SignificantWhitespace:
case XmlNodeType.Whitespace:
node = new XText(reader.Value);
break;
case XmlNodeType.CDATA:
node = new XText(reader.Value, TextType.CData);
break;
case XmlNodeType.Comment:
node = new XComment(reader.Value);
break;
case XmlNodeType.ProcessingInstruction:
node = new XProcessingInstruction(reader.Name, reader.Value);
break;
case XmlNodeType.DocumentType:
node = new XDocumentType(reader.LocalName, reader.GetAttribute(“PUBLIC”), reader.GetAttribute(“SYSTEM”), reader.Value);
break;
case XmlNodeType.EntityReference:
reader.ResolveEntity();
continue;

                    case XmlNodeType.XmlDeclaration:
case XmlNodeType.EndEntity:
continue;
default:
throw new InvalidOperationException();
}
if (parent == null) return node;
parent.Add(node);
} while (reader.Read());
return null;
}
}

}


Comments (7)

  1. chionhhm says:

    P.S. Please ignore this if you can view my previous comments.  Apologise for the repetition as I am not sure the comment is submitted successfully since I do not see it reflecting on the web page.

    Dear Mike,

    I have tried your solution and its works.  However, I tried to load the xml string via the XDocument.Xml call which returns the xml in a single line.  Thus, it results in the line number to be always 1.  Is there a solution to this?

    Regards

    chionhhm

  2. xmlhacker says:

    @chionhhm,

    The line,

    document.LoadWithLineInfo(XmlReader.Create(new StringReader(markup)));

    … is kind of essential to this whole excercise.  Using XDocument.Xml doesn’t provide any of the annotations that Mike’s code base adds via the LineInfo class which is then implemented via the (class)Extension/LoadWithLineInfo method.

  3. chionhhm says:

    Dear xmlhacker,

    You are right.  Maybe I should copy my code here to make it clearer what I am trying to say.

               XDocument fileDoc = XDocument.Load(@"test.exe.config");

               string xml = fileDoc.Xml;

               XDocument doc = new XDocument();

               Extension.LoadWithLineInfo(doc, XmlReader.Create(new StringReader(xml)));

               XElement element = doc.Element("configuration").Element("test");

               LineInfo li = Extension.GetLineInfo(element);

               string message = "Line Number: " + li.Number + " Line Position: " + li.Position;

    The code I have used above do not explicitly create the xml and store it in the "markup" string as Mike example has shown, instead I get the xml string from the call XDocument.Xml where the xml is loaded from a file.

    However, due to the XDocument.Xml call, the xml string returns does not seems to maintain the line feed characters.  Therefore, for my code above the line number is always 1, something I do not want.  I prefer to have the exact line as if I have read it from the file directly.

    Regards

    chionhhm

  4. MCChampion says:

    You can put the sample data in a file and load directly from the file with the reader.  For example, if the data is in a file "sample.xml" in the project directory:

    document.LoadWithLineInfo(XmlReader.Create("../../sample.xml"));

    The point of using StringReader in the example was simply to make it self-contained in a single file.

  5. chionhhm says:

    Dear Mike,

    Thanks, it works!  Anyway, I have added an additional case for XmlDeclaration that comes before CDATA in the LoadNode function of the Extension class, so that the XML data with XML declaration can goes through.

    Regards

    Hui Ming

  6. xmlhacker says:

    Thanks for the clarification, chionhhm!  In re-reading your post I can now see what you meant.

  7. As S. Somasegar announced , Orcas Beta 1 is ready to ship and will be generally available for download