BizTalk Server will split up your documents for you.


Lots of people have asked me how you split up a documents. You don’t have to write code for this it is a built in feature of the parsers in BizTalk Server. For example check out this sample .txt document. It has a header (Northwind Shipping) and multiple purchase orders (PO1999-10-20 and PO1999-10-21) and a trailer (END OF RECORD).


“Northwind Shipping
PO1999-10-20
US        Alice Smith         123 Maple Street    Mill Valley    CA 90952
US        Robert Smith        8 Oak Avenue        Old Town       PA 95819
Hurry, my lawn is going wild!
ITEMS,ITEM872-AA|Lawnmower|1|148.95|Confirm this is electric,ITEM926-AA|Baby Monitor|1|39.98|Confirm this is electric|1999-10-21
PO1999-10-21
US        John Dow            123 Elm Street      Mill Valley    CA 90952
US        July Dow            8 Pine Avenue       Old Town       PA 95819
Please ship it urgent!
ITEMS,ITEM398-BB|Tire|4|324.99|Wrap them up nicely,ITEM201-BB|Engine Oil|1|12.99|SAE10W30|1999-05-22


1234567“


In the XML dissassembler you can a schema for the header (Northwind Shipping), the envelope (the repeating unit), and the document (specification) and the trailer (1234567).


For more details check out the SDK sample in C:\Program Files\Microsoft BizTalk Server 2004\SDK\Samples\Pipelines\AssemblerDisassembler\EnvelopeProcessing.


Comments (10)

  1. Let’s see:

    Write my own code, pay several thousand dollars for biztalk…

    hmmm…

    That’s a tough call, especially for something as difficult as splitting up documents.

    Well, I never was that good at sarcasm in writing, but you get my point =)

  2. Jody says:

    Scott, technically your example is using the Flat File dissassembler, not the XML dissassembler.

    I still have not found a way to split xml records in a larger xml file without a custom dissassembler. Any examples?

  3. Unfortunately sometimes you need the details in the header in each split message. This is a particular challenge when splitting documents. Say for example:

    <envelope><receiveddate>1/1/2004</receiveddate><message><….></….></message><message><….></….></message></envelope>

    So, with the built in splitter I believe you can automatically have each message wind up in Biztalk without a bunch of work, but if you need the receive date as within each message I have yet to figure that one out without mapping first to a new structure then splitting. This is a current challenge we face.

    Scott, is there a pattern for this?

    Shawn

  4. Andrei Maksimenka says:

    Sorry Scott, let me answer on how to split XML interchanges;-)

    Yes, XML disassembler is able to disassemble interchanges of XML messages, and unwrap one or more envelopes, e.g. if you have an XML interchange:

    <ns0:envelope xmlns:ns0="ns0">

    <header>

    <firstName>Andrei</firstName>

    <lastName>Maksimenka<lastName>

    </header>

    <documents>

    <ns1:document xmlns:ns1="ns1">

    <!– some stuff–>

    </ns1:document>

    <ns1:document xmlns:ns1="ns1">

    <!– some stuff–>

    </ns1:document>

    <ns1:document xmlns:ns1="ns1">

    <!– some stuff–>

    </ns1:document>

    </ns0:envelope>

    You need to define two schemas, one for envelope and one for document. In schema for envelope which schematically looks like:

    <schema>

    envelope

    header

    firstName

    lastName

    documents

    <any>

    you need to set property Envelope to ‘yes’ and specify Body XPath pointing to the XML element "documents".

    Schema for document should schematically look like:

    <schema>

    document

    You can use default XML receive pipeline to disassemble that interchange, or custom receive pipeline where Envelope Schema Names and Document Schema Names have envelope and document schemas specified. The latter can be helpful if you want to workaround schema ambiguity problems (when more than one schema with the same message type or targetNamespace#rootRecordName are deployed), to do document structure enforcement (all incoming documents must contain envelope and document as specified) or use XML validation during disassembling (set Validate Document Structure to ‘yes’).

  5. Shawn,

    if you define a Body XPath in envelope schema to point to the root record itself, output messages will be one receiveddate document (if you have a schema for that message type available) and several message documents. Looks like it’s not what you expect.

    Recommended solution here is to have a custom disassembler pipeline component to get rid of receiveddate, may be promote it to the message context and produce one or more message documents with receiveddate as a context property value. You cab use mapping but you’ll need to send mapped message through a receive pipeline again to split it into documents, unless you have a simple pipeline component in pre-disassembling stage which does XSLT transformation itself. Custom pipeline components are perfectly fit extensible BizTalk Server pipeline architecture, and in most cases they are relatively simple to implement, unless you need to take care of streaming data processing like it’s done with standard pipeline components to support large messages.

    BTW, the SDK for the web release timeframe will have several pipeline components in source code well commented and with sample projects.

  6. Scott Woodgate (MSFT) says:

    Nice. In case everyone wondered Andrei Maksimenka is one of the key developers on BizTalk Server 2004 and owns this functionality. Thanks for jumping into the discussion Andrei 🙂

  7. Udi, in response to your comment, see the comment thread at http://weblogs.asp.net/cameronreilly/archive/2004/02/13/72410.aspx

    it’s quite similar.

  8. Dave Cintron says:

    Hi Scott,

    I also need to do this but under BTS 2002 and I can’t install SDK 2004 unless I have BTS 2004. Is there another place I can find the source for this?

    Thanks, Dave

  9. Brian Mangene says:

    I have a similar but I think slightly more complex problem and am hoping for a recommended approach. I need to take an incoming flat file, say:

    SomeHeaderStuff RECORDCOUNT

    KeyValue SOMESTUFF

    KeyValue SOMESTUFF

    KeyValue SOMESTUFF

    SomeTrailerStuff RECORDCOUNT

    For each record in the incoming file I need to read a database table using the value of the KeyValue Field. Each record of the incoming file needs to end up in one of two message, say A and B.

    Message A will be used to update the database. Message B needs to be reassembled into a new flat file with re-calculated record counts. This file will be passed along to another sysetm. The record layout of the new flat file will be identical to that of the received file. Of course it will contain less records than the original file and will also have the recalculated record counts as I mentioned above. Any suggestions on where to start with this would be appreciated.