Selecting the flat file disassembling schema dynamically

Sometimes, business applications encode their data by specifying a message type in the file's header. The body's layout changes depending on the message type. When you receive such messages in BizTalk, you are facing the problem of having to disassemble a flat file according to two or more different schemas within a single pipeline.

There are multiple ways to achieve this. One of them is to build a schema that encompasses all possible message types. This approach is not always possible. When it is possible, it has the following disadvantages:

  1. Loss of strong typing: Orchestration(s) have to be setup to accept a message that can represent multiple different business documents: purchase order, purchase order confirmation... This means that you cannot be 100% sure which orchestration deals with what document just by looking at the message type that activates it. An orchestration dealing with purchase orders will still potentially be capable to receive other business documents. You could set up a filter on the receive shape to further constraint which document your orchestration processes but the orchestration will still have access to fields which are relevant only for other message types and your code might attempt to access a field that is not populated. Such an error would be caught at runtime. This really lowers the maintainability and the ability to quickly troubleshoot a solution,

  2. Tight coupling: if you would like to add another message type or alter the schema of a specific message type, you need to re-build/re-deploy all orchestrations because they all depend on the same schema you are changing,

  3. Deployment./versionning issues: You will most likely have already running instances in your system and you might not have the liberty to  terminate them all when you deploy your changes. Deployment of a solution where everything depends on one common artifact (a schema for instance) is more complex than deployment of a loosely coupled solution.

A better solution is to have an orchestration per business document. Every orchestration receives a message of a specific schema: purchase order, request for quote... To implement this, you need to sniff the flat file data and disassemble it as appropriate for the message type. This means that depending on the data you receive, you need to figure out which schema needs to be produced during disassembling. There are many ways to do this. You could use BizTalk probing capabilities and specify more than one flat file disassembler in the same pipeline: only the instance that can recognize the data would run. Today, I would like to demonstrate how one can reuse the out of the box flat file disassembler in a custom disassembler component.

This page explains how to extend the flat file disassembler so I'll explain how to contain it and use it as an utility class to perform disassembling. The overall strategy is to initialize an instance of the flat file disassembler (FFDAsmComp), sniff the data to figure out which schema should be used and feed these informations to the flat file disassembler so it can perform the hard work for you.

Two sample messages

To dynamically select the disassembling schema, we need to define at least two flat files format and two disassembling schemas. We will use the following sample messages. The first one is a sketch of a purchase order comfirmation while the second one is a simplified set of line items for a purchase order (of course, all data are just placeholders).


Microsoft, Corp.|1234579




Windows XP SP2 Home Edition|1|$99

Visual Studio.NET 2003 Learning Edition|1|$200

The sample Visual Studio.NET 2003 solution available here contains the two previous messages as well as two schemas used to disassemble these two files with the flat file disassembler. We will call these two schemas POConfirmation.xsd and POLineItems.xsd. These schemas are easy to build so I won't describe them here.

Containing the flat file disassembler

You are probably familiar with the concept of containment of an object within another one and if not, this page (while a little COM centric) should bring you up to speed. Containing the flat file disassembler is not hard at all: just create an instance of it as if you wrote it and call it! You just have to remember to forward properties related calls to your contained instance to ensure proper initialization (IPersitPropertyBag implementation).

/// <summary>
/// Instance of the Flat File Disassembler we contain.
/// </summary>
private FFDasmComp containedFFDasm new FFDasmComp();

/// <summary>
/// Initializes a new Property Bag.
/// </summary>
public void InitNew()
  // Make sure the contained disassembler gets a chance to initialize itself

/// <summary>
/// Load data from a persisted object into a new object.
/// </summary>
/// <param name="propertyBag">Property bag to load.</param>
/// <param name="errorLog">Error log.</param>
public void Load(IPropertyBag propertyBag, int errorLog)
   // Make sure the contained disassembler gets a chance to load properties from the bag
   containedFFDasm.Load(propertyBag, errorLog);

/// <summary>
/// Saves properties to the property bag.
/// </summary>
/// <param name="propertyBag">The property bag to manipulate.</param>
/// <param name="clearDirty">Indicates if we should reset our "isDirty" flag.</param></param>
/// <param name="saveAllProperties">Should all properties be saved?</param>
public void Save(IPropertyBag propertyBag, bool clearDirty, bool saveAllProperties)
  // Make sure the contained disassembler gets a chance to save properties from the bag
  containedFFDasm.Save(propertyBag, clearDirty, saveAllProperties);

The real work happens in the Dissassemble method. Here, we sniff the input and if we can find an adequate schema, we just tell the flat file disassembler to use it. The snippet below makes usage of the not so well known class SchemaWithNone. It is so confidential that the assembly containing its implementation is only installed in the GAC. As a result, you must edit the Visual Studio project file manually to add a reference to the assembly... Anyway, this class allows you to manipulate BizTalk Schemas the way the flat file disassembler does it internally.

/// <summary>
/// Called once per incoming document by the Messaging engine.
/// </summary>
/// <param name="pContext">IPipelineContext: context for this pipeline.</param>
/// <param name="pInMsg">IBaseMessage: Base message received.</param>
public void Disassemble(IPipelineContext pContext, IBaseMessage pInMsg)
   // Is there something to disassemble?
   if ((pInMsg.PartCount >= 1) && (pInMsg.BodyPart != null))
      // Sniff the input stream to figure out which schema we should be using
      SchemaWithNone schema = SniffInput(pInMsg);

      // If we found the schema, parse the file
      if (schema != null)
         // Configure the flat file disassembler with the adequate schema
         containedFFDasm.DocumentSpecName = schema;

         // Disassemble the document
         containedFFDasm.Disassemble(pContext, pInMsg);
         // There may be one or more messages to produce
         shouldAttemptToProduceMessages = true;
         // Could not find a schema - Report error
         Exception errInfo = new Exception("Wrong message type");

         // Do not attempt to produce messages
         shouldAttemptToProduceMessages = false;

The heart of this is of course the SniffInput() function. It is pretty straightforward. We just read part of the input, figure out what type it is and return the schema. The SchemaFromType(int typeID) function just creates a new instance of "SchemaWithNone" passing the typename of the schema as it will be when deployed in BizTalk: TestProject.POLineItems for instance:

/// <summary>
/// Sniffs the flat file data and determine which schema to use.
/// </summary>
/// <param name="pInMsg">IBaseMessage instance to sniff</param>
/// <returns>SchemaWithNode: schema to use or null if no match was found.</returns>
private SchemaWithNone SniffInput(IBaseMessage pInMsg)
   SchemaWithNone adequateSchema = null;

   // Get the original stream and make sure we are starting from the beginning
   Stream docStream = pInMsg.BodyPart.GetOriginalDataStream();
   docStream.Seek(0, SeekOrigin.Begin);

   // ------------------------------------------------------------------------
   // We will read the first line of the document. If it matches:
   // Type:<number>
   // we will decide that we recognized the data and <number> is the type
   // of the message.
   // ------------------------------------------------------------------------

   // Use a stream reader to facilitate reading line by line
   string lineRead   = ReadLine(docStream, pInMsg.BodyPart.Charset);
   Regex  typeRegexp = new Regex("Type:[ \\t]*(?<Type>[0-9]+)", RegexOptions.Compiled | RegexOptions.Singleline);
   Match  match      = typeRegexp.Match(lineRead);

   if (match.Success)
     int typeID = Int32.Parse(match.Groups["Type"].Value);
     adequateSchema = SchemaFromType(typeID);       // return new SchemaWithNone(<schema name (string) associated with typeID>);

   // ------------------------------------------------------------------------
   // We just read from the input stream the exact amount of data to skip the
   // header. We do not need to "rewind" the stream here. We can just pass it
   // to the flat file disassembler. As a result, we do not need to define
   // a schema to parse the header since the disassembler will never see it
   // ------------------------------------------------------------------------

   return adequateSchema;

The second important part of a disassembler is the GetNext function. There is really nothing to explain here. Most of the work is done by the flat file disassembler.

/// <summary>
/// Get the result of the disassembling process.
/// </summary>
/// <param name="pContext">IPipelineContext: context information.</param>
/// <returns>IBaseMessage: a message or null if no more messages to be produced.</returns>
public IBaseMessage GetNext(IPipelineContext pContext)
   IBaseMessage newMsg = null;

   // Delegate the work to the Flat File Disassembler
   if (shouldAttemptToProduceMessages)
      newMsg = containedFFDasm.GetNext(pContext);

   return newMsg;

In the future, I will explain how to write a custom property editor so users can easily map message types to particular schemas. This will involve make the component a true RAD component, as explained here.

Comments (3)

  1. Martijn says:

    Hi Gilles,

    Excellent code sample, just one question though. Isn’t this what IProbeMessage is meant for?

  2. Gilles says:

    Martijn: Yes, absolutely. IProbeMessage is another way to achieve this.

    The probing capabilities can also be used to select the schema at runtime, in a slightly different way. Probing requires the "FirstMatch" execution mode, which is used when several parser components are placed in the same pipeline.

    Essentially, you put more than one probing-capable parser at disassemble stage and the engine calls them all to probe until one says "yes, I can handle this".

    Each parser might need to read a significant part of the message to figure out if it can handle it or not. Repeatedly reading large parts of a message might be a performance problem. The above example reads data once and can tell which schema to use so it is better suited to these scenario.

    I have not personally encoutered a lot of situations where reading a large part of the message was required to figure out its type. Most of the time, it was possible (via some tricks) to read only small parts of the message. However, one should always keep performances in mind when designing pipelines.

  3. A be&#233;rkező flat f&#225;jl s&#233;m&#225;j&#225;nak dinamikus kiv&#225;laszt&#225;s&#225;t l&#225;thatjuk egyetlen pipeline-nal megoldva, term&#233;szetesen BizTalk alatt.

Skip to main content