BizTalk Aggregation Pattern for Large Batches


There are a few instances of Aggregator patterns for BizTalk Server, but I recently had a customer who was dealing with
a very large batch, and therefore the existing patterns were insufficient. So, I threw together a sample that showed
how to handle very large batches in a fairly speedy manner.

In this use case, we have an XML message come into BizTalk, want to split out the batch, process each message individually, then
aggregate the whole batch back together again and send a single message out. Now using existing
aggregation patterns
didn’t work since they mostly involve building up an XML file in memory. When a batch of 1000+ message
got submitted it was taking over 10 minutes to process the batch, which was way too slow.

So, my solution uses a combination of envelope de-batching, sequential convoy pattern, and file streaming. The first part of
the solution was to build the XML schemas necessary. The first schema, the envelope schema, looked like this …



Three things to note here. First, I set the <Schema> node’s Envelope property to Yes. This way I won’t
have to set anything in my pipeline for it to process the batch. Secondly, I promoted both BatchID and Count fields.
We’ll see the significance in a moment. Thirdly, I set the Body XPath property of the root node to the Record
node of the schema. This way, the pipeline will see that it’s an envelope, and, know where to split the batch at.

The second schema is the body schema. It looks like this …



Notice that it too has promoted properties. By doing this, the values from the master envelope schema get copied down
to EACH debatched message without me doing anything special. So, setting up my schemas this way, in combination with using
the standard XML receive pipeline, will cause my message to be debatched, and, push values down from the envelope to each
message. Cool!

Remember that I have to rebuild the message on the way out. So, I use the file system to build this message up, instead of
doing it in memory and increasing my footprint. Therefore, when I first receive the message into my orchestration I need
to create this temporary file on disk (and re-create the envelope header). Likewise, when all the messages have been received and processed, I also need
to write the closing envelope text. This way I have a temporary file in disk that contains the envelope, and all the processed
message bodies.

So, I wrote a quick component to write the envelope header and footer to disk. It also will load the completed file and return
an XML document. It looks like this …

public void WriteBatchHeader()

{

string docpath = @”C:\Data\AggTemp\Dropoff\output.xml”;

//envelope header text

string header = ““;

StreamWriter sw = new StreamWriter(docpath, true);

sw.Write(header);

sw.Flush();

sw.Close();

}

public void WriteBatchFooter()

{

string docpath = @”C:\Data\AggTemp\Dropoff\output.xml”;

//closing envelope text

string footer = ““;

StreamWriter sw = new StreamWriter(docpath, true);

sw.Write(footer);

sw.Flush();

sw.Close();

}

public XmlDocument LoadBatchFile()

{

string docpath = @”C:\Data\AggTemp\Dropoff\output.xml”;

XmlDocument xmlDoc = new XmlDocument();

StreamReader sr = new StreamReader(docpath);

xmlDoc.Load(sr);

sr.Close();

return xmlDoc;

}

Alright, now I have my orchestration. As you can see below, in this process I start by:

  • Receive the first debatched message.
  • Call my helper component (code above), create the temporary file, and add the top XML structure.
  • Then I send the message out (initializing the correlation set) via FILE adapter (see below for configuration).
  • Then in the Expression shape I set the loop counter off of the promoted/distinguished Count field (which holds the number of
    records in the entire batch).
  • I then process each subsequent debatched message via a sequential convoy, following the previously initialized
    correlation set. In this example, I do no further message processing and simply send the message back out via
    the FILE adapter.

Once the loop is done, that means I’ve processed all the debatched records. So, the last half of the orchestration
deals with the final processing. Here I do the following:

  • Add a 10 second delay. This was necessary because the messaging engine was still writing files to disk
    as the next component was trying to add the closing XML tags.
  • Call the helper component (code above) to write the XML footer to the entire message.
  • Next I construct the outbound message by setting the message variable equal to the response of the
    LoadBatchFile() defined above. I added trace points to see how long this took, and it was subsecond. This
    was to be expected even on the large file since I’m just streaming it all in.
  • Finally, send the rebuilt message back out via any adapter.



The last step was to configure the FILE send adapter being used to write this file to disk. As you can see in the picture,
I set the PassThruTransmit pipeline and configured the Copy mode to Append. This caused each message
sent to to this port to be added to the same outbound file.

So, when it was all said and done, I was able to process a batch with 1000 messages in around 2 minutes on a laptop. This was
significantly better than the 10+ minutes the customer was experiencing. I’m still using a sequential convoy so I can’t process
messages as quickly as if I was spawning lots of unique orchestrations, but, our speed improvement came from streaming files
around vs. continually accessing the XML DOM. Neat stuff.

Technorati Tags:

Comments (12)

  1. Jeff Lynch says:

    Richard,

    Excellent, excellent, EXECELLENT Post. Best down-to-earth, large message sample I’ve seen. Damn, I wished I’d seen this two months ago. Time for a little refactoring….

    Jeff

  2. Winson says:

    Fantastic post. Passing this little jewel along…

  3. Thanks fellas.  I always liked the original patterns architecturally, but it seemed to break down after so many iterations.  And Jeff, I’m just trying to keep up with your comically prolific Commerce Server posting with a few BizTalk ones of my own!

  4. Jeff Lynch says:

    Richard,

    I didn’t know my posts were comically prolific but I do have four daughters so I guess my life is.

    PS: I can send you the XLS File Pipeline Component beta if you’ll ping me.

    Jeff

  5. Mohamed Zahra says:

    i wonder why all your blog doesn’t contain any zipped file for the article you are describing , belive me this really helps

    hope that i am clear enough

  6. Point taken.  However, most of my posts result from work I do with customers, so making it "blog ready" involves an effort.  And given the choice between throwing cleaned up code out there, or writing detailed descriptions of a solution, I’m always going to choose the latter.

  7. Just a bit over a year since I started BizTalk blogging, so I thought I’d take 5 minutes and review…

  8. Thanks says:

    Richard, Thanks for the solution, It’s not possible to have a .zip file with a sample?

  9. Hey there, I haven’t "cleaned" this up from my customer work, but I included all the key steps here.  If you have any trouble putting it together, feel free to post!

  10. Dhanush says:

    Richard,

    I tried this solution, worked for small size files, I could see Header, individual xml files and Footer.

    But it is creating problems for large size files(>2MB each file), orchestration sent out all xml messages to port, delay occured and tried to write footer, at this point got err saying that file is being used by other prcess. And in HAT, I could see all send ports waiting to be written to disk.

    So, I’m assuming since we really doesn’t know how long it takes to flush all messages to disk, we can’t assume any amount of time in delay.

    Let me know your thoughts on this.

    Apart from this problem, your article is great.

  11. Glad you tried it out!  Sounds like the delay isn’t large enough for the 2MB file .. the component can’t write the footer since BizTalk is still adding messages to the batch.  You could set this to be an artificially high number (10 minutes) to be more confident that the engine had finished processing.  

  12. There are many ways to do debatching in BizTalk 2006. Some conventional ones include: Envelope Message

Skip to main content