BizTalk Aggregation Pattern for Large Batches

There are a few instances of Aggregator patterns for BizTalk Server, but I recently had a customer who was dealing with
a very large batch, and therefore the existing patterns were insufficient. So, I threw together a sample that showed
how to handle very large batches in a fairly speedy manner.

In this use case, we have an XML message come into BizTalk, want to split out the batch, process each message individually, then
aggregate the whole batch back together again and send a single message out. Now using existing
aggregation patterns didn't work since they mostly involve building up an XML file in memory. When a batch of 1000+ message
got submitted it was taking over 10 minutes to process the batch, which was way too slow.

So, my solution uses a combination of envelope de-batching, sequential convoy pattern, and file streaming. The first part of
the solution was to build the XML schemas necessary. The first schema, the envelope schema, looked like this ...

Three things to note here. First, I set the <Schema> node's Envelope property to Yes. This way I won't
have to set anything in my pipeline for it to process the batch. Secondly, I promoted both BatchID and Count fields.
We'll see the significance in a moment. Thirdly, I set the Body XPath property of the root node to the Record
node of the schema. This way, the pipeline will see that it's an envelope, and, know where to split the batch at.

The second schema is the body schema. It looks like this ...

Notice that it too has promoted properties. By doing this, the values from the master envelope schema get copied down
to EACH debatched message without me doing anything special. So, setting up my schemas this way, in combination with using
the standard XML receive pipeline, will cause my message to be debatched, and, push values down from the envelope to each
message. Cool!

Remember that I have to rebuild the message on the way out. So, I use the file system to build this message up, instead of
doing it in memory and increasing my footprint. Therefore, when I first receive the message into my orchestration I need
to create this temporary file on disk (and re-create the envelope header). Likewise, when all the messages have been received and processed, I also need
to write the closing envelope text. This way I have a temporary file in disk that contains the envelope, and all the processed
message bodies.

So, I wrote a quick component to write the envelope header and footer to disk. It also will load the completed file and return
an XML document. It looks like this ...

public void WriteBatchHeader()

{

string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";

//envelope header text

string header = "";

StreamWriter sw = new StreamWriter(docpath, true);

sw.Write(header);

sw.Flush();

sw.Close();

}

public void WriteBatchFooter()

{

string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";

//closing envelope text

string footer = "";

StreamWriter sw = new StreamWriter(docpath, true);

sw.Write(footer);

sw.Flush();

sw.Close();

}

public XmlDocument LoadBatchFile()

{

string docpath = @"C:\Data\AggTemp\Dropoff\output.xml";

XmlDocument xmlDoc = new XmlDocument();

StreamReader sr = new StreamReader(docpath);

xmlDoc.Load(sr);

sr.Close();

return xmlDoc;

}

Alright, now I have my orchestration. As you can see below, in this process I start by:

  • Receive the first debatched message.
  • Call my helper component (code above), create the temporary file, and add the top XML structure.
  • Then I send the message out (initializing the correlation set) via FILE adapter (see below for configuration).
  • Then in the Expression shape I set the loop counter off of the promoted/distinguished Count field (which holds the number of
    records in the entire batch).
  • I then process each subsequent debatched message via a sequential convoy, following the previously initialized
    correlation set. In this example, I do no further message processing and simply send the message back out via
    the FILE adapter.

Once the loop is done, that means I've processed all the debatched records. So, the last half of the orchestration
deals with the final processing. Here I do the following:

  • Add a 10 second delay. This was necessary because the messaging engine was still writing files to disk
    as the next component was trying to add the closing XML tags.
  • Call the helper component (code above) to write the XML footer to the entire message.
  • Next I construct the outbound message by setting the message variable equal to the response of the
    LoadBatchFile() defined above. I added trace points to see how long this took, and it was subsecond. This
    was to be expected even on the large file since I'm just streaming it all in.
  • Finally, send the rebuilt message back out via any adapter.

The last step was to configure the FILE send adapter being used to write this file to disk. As you can see in the picture,
I set the PassThruTransmit pipeline and configured the Copy mode to Append. This caused each message
sent to to this port to be added to the same outbound file.

So, when it was all said and done, I was able to process a batch with 1000 messages in around 2 minutes on a laptop. This was
significantly better than the 10+ minutes the customer was experiencing. I'm still using a sequential convoy so I can't process
messages as quickly as if I was spawning lots of unique orchestrations, but, our speed improvement came from streaming files
around vs. continually accessing the XML DOM. Neat stuff.

Technorati Tags: BizTalk