Streamed XPath Extraction using hidden BizTalk class XPathReader

Usually when writing custom BizTalk pipeline components you find yourself wanting to extract specific values from the message passed using Xpath statements.

You can do this either by XPathDocument or XDocument, but this solution would require loading the entire XML into memory and if the XML file is huge that can be not possible. Also it makes the pipeline component slower. The solution is to use a streamed class such as XMLReader. But that would be too much work to do, right?

The solution comes in the form of a hidden GEM in the BizTalk installed components, called the XPathReader. This is a stream based class that would search for a node or element using the given set of XPath strings.

This class is defined in the assembly Microsoft.BizTalk.XPathReader.dll deployed to the GAC. You need to add a reference to this assembly first and then use the class as below.

            MsgStream.Seek(0, SeekOrigin.Begin);

            XmlReader reader = XmlReader.Create(MsgStream, settings);

            string strValue = null;

            if (!string.IsNullOrEmpty(MsgXPath))


                XPathCollection xPathCollection = new XPathCollection();

                XPathReader xPathReader = new XPathReader(reader, xPathCollection);


                if (xPathReader.ReadUntilMatch())


                    if (xPathReader.Match(0))


                        strValue = xPathReader.ReadString();



                MsgStream.Seek(0, SeekOrigin.Begin);


Where the MsgStream is a seekable steam obtained from the message.

Comments (3)

  1. The Dag says:

    (Sorry if I'm double-posting, but the page gave NO feedback to let me know if anything was posted… then I noticed I hadn't signed in, so I did and now I try again…)

    To my mind, either you're doing it wrong, or else the BizTalk pipeline is just badly designed. This approach, although better than loading an XDocument or XmlDocument, still is NOT a streaming approach, from the point of view of the entire pipeline.

    If you had 7 components in your pipeline and they all operated according to this principle, you'd stream through all the data 7 times. It should be obvious that this isn't great.

    But if you implemented an actual *stream* class that did the processing, your pipeline component would just wrap the stream (from message body) in your custom stream class, then return immediately. Whatever reads from the *end* of the pipeline would thus cause all of the chained streams' Read methods to be invoked, and you'd only need to actually read the data ONCE from the original source.

    It may well be that it's I who don't get this, because I have NEVER seen anyone build a pipeline in the way I am suggesting here. I frankly don't understand why – but must say the pipeline model in BizTalk does invite you to do exactly the sort of thing you are doing here, since it is NOT based on streaming but rather on each component "processing the message, then passing it to the next stage in the pipeline".

    It also may well be that it's difficult to implement XML processing as a stream, but I imagine that should be manageable, although probably a bit error-prone and something that would need good testing.

  2. momalek says:

    Yes you are right and in this post I was not trying to perform full streamed pipeline I was trying to avoid simply loading the entire XML in memory as this consumes resources and time.

  3. Anas Hammo says:

    It is not streaming approach, you could enhance it by encapsulating the original stream in Virtual Stream firstly, still it is a good practice to read the message without highly consuming the memory, for full streaming you need to write your custom stream implementing Read method, or for Xml messages simply use XPathMutatorStream or XmlTranslatorStream