Going deeper with Flat File schema authoring for BizTalk and Azure Logic Apps Flat File encode-decode action

So you've gone through the basics at the MSDN Flat File Wizard How-to , Walkthrough and Logic Apps Flat File Encode-Decode action, now what?

Imagine you have a flat file format with a header record followed with repeating line and sub-line. The field delimiter is the “|” and the record terminator is  CR/LF.

 HEADER|H1|H2|H3|
 LINE|L1F1|L1F2|L1F3|
 SUBLINE|SL1F1|SL1F2|
 LINE|L2F1|L2F2|L2F3|
 SUBLINE|SL2F1|SL2F2|

You've generated a schema using the wizard on the header field and the first occurrence of line and sub-line (marking these as 'repeat' records). Your schema looks like this in Visual Studio:

Initial schema

For troubleshooting your new schema, get used to the FFAsm and FFDasm command line utilities from the BizTalk SDK pipeline tools. Don't worry about the specific 2004 version mentioned in the MSDN article: the tools are present in all versions of BizTalk SDK, including the latest 2016 version. The first time you run the tools you may run into obscure type loading failure exceptions for classes in Microsoft.BizTalk.Pipeline and/or Microsoft.BizTalk.Streaming such as:

 Error
 Source:         Flat file disassembler
 Message:        Field not found: 'Microsoft.BizTalk.Streaming.XmlBufferedReaderStream.m_outputStream'.
 HRESULT:        80131511

This is because the pipeline tools assume that both assemblies have been registered in the GAC (Global Assembly Cache). When deploying a BizTalk Server project these are automatically registered; however when you are developing for Azure Logic Apps you are unlikely to build and deploy such project, or you may be keeping your development machine separated from the BizTalk Server instance where you run the integration. You will need to use GacUtil.exe to register the assemblies. Start a command prompt with Administrator privileges then

 C:\WINDOWS\system32>cd C:\Program Files (x86)\Microsoft BizTalk Server 2013 R2

C:\Program Files (x86)\Microsoft BizTalk Server 2013 R2>"C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6.1 Tools\x64\gacutil.exe" -i Microsoft.BizTalk.Pipeline.dll
 Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.0
 Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

And

 "C:\Program Files (x86)\Microsoft SDKs\Windows\v10.0A\bin\NETFX 4.6.1 Tools\x64\gacutil.exe" -i "C:\Program Files (x86)\Microsoft BizTalk Server 2013 R2\Microsoft.BizTalk.Streaming.dll"

(Note that I am picking the GAC utility from the Microsoft SDK. Depending on your machine you might find it instead under an older Visual Studio program files folder, or .NET 1 folder in Windows' system files.)

A normal run of the pipeline tool will look like this:

 "C:\Program Files (x86)\Microsoft BizTalk Server 2013 R2\SDK\Utilities\PipelineTools\FFDasm.exe" "C:\[..your path..]\FlatFileTest.txt" -bs "C:\[..your path..]\Schema.xsd" -v
 Creating objects.
 Creating message.
 Adding message to a pipeline.
 Executing pipeline.
 Getting processed message(s).
 Doing output for a message 1.

The tool will by default generate a GUID and write a file under that name in the current directory, e.g. {5c27b4f5-e0ed-447f-a73f-f92a789217bb}.xml

You can view either flat file or xml document file with notepad or any text editor.

However if you use the schema generated earlier and the input file with repeating line and sub-line you will get as output:

 Creating objects.
Creating message.
Adding message to a pipeline.
Executing pipeline.
Getting processed message(s).
Doing output for a message 1.


Error
        Source:         Flat file disassembler
        Message:        The remaining stream has unrecognizable data.

        HRESULT:        c0c01464

If you inspect the generated xml document, only the first line and sub-line have been decoded. The second line is triggering the unrecognizable data error.

This is a mismatch between the schema and the record. The schema says to expect at the root a complex type sequence with one instance of ‘header’ then repeating ‘line’, then repeating ‘subline’. The parser finds for the sequence the header, a single line, a single sub-line and at this point has completed the sequence. Because the line and sub-line together are not expected to repeat the parsing stops there.

(In a Flat File Decode run in Logic App you will obtain the same XML output of:

 <?xml version="1.0" encoding="utf-8"?>
<Root xmlns="https://FlatFile.Schema">
  <HEADER xmlns="">
    <HEADER_Child1>H1</HEADER_Child1>
    <HEADER_Child2>H2</HEADER_Child2>
    <HEADER_Child3>H3</HEADER_Child3>
    <HEADER_Child4></HEADER_Child4>
  </HEADER>
  <LINE xmlns="">
    <LINE_Child1>L1F1</LINE_Child1>
    <LINE_Child2>L1F2</LINE_Child2>
    <LINE_Child3>L1F3</LINE_Child3>
    <LINE_Child4></LINE_Child4>
  </LINE>
  <SUBLINE xmlns="">
    <SUBLINE_Child1>SL1F1</SUBLINE_Child1>
    <SUBLINE_Child2>SL1F2</SUBLINE_Child2>
    <SUBLINE_Child3></SUBLINE_Child3>
  </SUBLINE>
</Root>

)

To declare the root as a sequence we need to declare line and sub-line as a group with order indicator sequence which repeats after the header type.

Updated Schema

Open the schema file in your favorite editor (I used Visual Studio's XML editor), and after the xs:annotation for the whole schema, insert an element

   <xs:group name="LineAndSubline">
    <xs:sequence>

Then cut-and-paste the definitions for xs:element "LINE" and "SUBLINE", removing the maxOccurs="unbounded" attribute on each. Close the xs:sequence and xs:group elements.

Then got after the <xs:element name="HEADER"> ... </xs:element> and insert a repeating reference to the newly declared group as:

 <xs:group maxOccurs="unbounded" ref="LineAndSubline"/>

This should be contained in the root element </xs:sequence>.

With this updated schema you should get the updated XML document output with repeating record group (you will find the example below in this post).

Group definition is a generic XML schema capability which can be authored from the initial output of the Flat File Wizard either by hand or any XML schema editor that you like best.

You may find reference XML schema documentation both on MSDN, e.g.

https://msdn.microsoft.com/en-us/library/ms256093(v=vs.110).aspx

and w3schools, e.g.

https://www.w3schools.com/xml/el_group.asp

The selection of the group ‘type’ (XSD indicator) between sequence, all or choice will dependent on your intent. For the example flat file above and expected XML document output we have a repeating sequence of line and sub-line in that specific order – hence the group with order indicator sequence. See more at:

https://www.w3schools.com/XML/schema_complex_indicators.asp

-*-

When you look at the Logic App run history, the text content will look like this (the following is not a screenshot so it can be copy-pasted for the next step):

 

 When a file is created 1s

 Flat File Decoding 0s
Inputs See raw inputs
Inputs
{
  "content": "HEADER|H1|H2|H3|\r\nLINE|L1F1|L1F2|L1F3|\r\nSUBLINE|SL1F1|SL1F2|\r\nLINE|L2F1|L2F2|L2F3|\r\nSUBLINE|SL2F1|SL2F2|\r\n",
  "integrationAccount": {
    "schema": {
      "name": "Schema2"
    }
  }
}

Outputs See raw outputs
Outputs
{
  "body": {
    "$content-type": "application/xml;charset=utf-8",
    "$content": "PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0idXRmLTgiPz4KPFJvb3QgeG1sbnM9Imh0dHA6Ly9GbGF0RmlsZS5TY2hlbWEiPgogIDxIRUFERVIgeG1sbnM9IiI+
CiAgICA8SEVBREVSX0NoaWxkMT5IMTwvSEVBREVSX0NoaWxkMT4KICAgIDxIRUFERVJfQ2hpbGQyPkgyPC9IRUFERVJfQ2hpbGQyPgogICAgPEhFQURFUl9DaGlsZDM+
SDM8L0hFQURFUl9DaGlsZDM+CiAgICA8SEVBREVSX0NoaWxkND48L0hFQURFUl9DaGlsZDQ+CiAgPC9IRUFERVI+CiAgPExJTkUgeG1sbnM9IiI+CiAgICA8TElORV9D
aGlsZDE+TDFGMTwvTElORV9DaGlsZDE+CiAgICA8TElORV9DaGlsZDI+TDFGMjwvTElORV9DaGlsZDI+CiAgICA8TElORV9DaGlsZDM+TDFGMzwvTElORV9DaGlsZDM+
CiAgICA8TElORV9DaGlsZDQ+PC9MSU5FX0NoaWxkND4KICA8L0xJTkU+CiAgPFNVQkxJTkUgeG1sbnM9IiI+CiAgICA8U1VCTElORV9DaGlsZDE+U0wxRjE8L1NVQkxJ
TkVfQ2hpbGQxPgogICAgPFNVQkxJTkVfQ2hpbGQyPlNMMUYyPC9TVUJMSU5FX0NoaWxkMj4KICAgIDxTVUJMSU5FX0NoaWxkMz48L1NVQkxJTkVfQ2hpbGQzPgogIDwv
U1VCTElORT4KICA8TElORSB4bWxucz0iIj4KICAgIDxMSU5FX0NoaWxkMT5MMkYxPC9MSU5FX0NoaWxkMT4KICAgIDxMSU5FX0NoaWxkMj5MMkYyPC9MSU5FX0NoaWxk
Mj4KICAgIDxMSU5FX0NoaWxkMz5MMkYzPC9MSU5FX0NoaWxkMz4KICAgIDxMSU5FX0NoaWxkND48L0xJTkVfQ2hpbGQ0PgogIDwvTElORT4KICA8U1VCTElORSB4bWxu
cz0iIj4KICAgIDxTVUJMSU5FX0NoaWxkMT5TTDJGMTwvU1VCTElORV9DaGlsZDE+CiAgICA8U1VCTElORV9DaGlsZDI+U0wyRjI8L1NVQkxJTkVfQ2hpbGQyPgogICAg
PFNVQkxJTkVfQ2hpbGQzPjwvU1VCTElORV9DaGlsZDM+CiAgPC9TVUJMSU5FPgo8L1Jvb3Q+"
  }
}

 Create file 0s

Flat File decode run history

Notice the unreadable encoded text $content in the output? That's Base64 encoded so no character interferes with the json syntax. To decode it, go to https://www.bing.com and search for base 64 decode. It will return a built-in Base64 encoder-decoder.

Bing Base 64 Decode utility

Paste the encoded string without the wrapping quotes ("") and hit decode, you get:

 <?xml version="1.0" encoding="utf-8"?>
<Root xmlns="https://FlatFile.Schema">
  <HEADER xmlns="">
    <HEADER_Child1>H1</HEADER_Child1>
    <HEADER_Child2>H2</HEADER_Child2>
    <HEADER_Child3>H3</HEADER_Child3>
    <HEADER_Child4></HEADER_Child4>
  </HEADER>
  <LINE xmlns="">
    <LINE_Child1>L1F1</LINE_Child1>
    <LINE_Child2>L1F2</LINE_Child2>
    <LINE_Child3>L1F3</LINE_Child3>
    <LINE_Child4></LINE_Child4>
  </LINE>
  <SUBLINE xmlns="">
    <SUBLINE_Child1>SL1F1</SUBLINE_Child1>
    <SUBLINE_Child2>SL1F2</SUBLINE_Child2>
    <SUBLINE_Child3></SUBLINE_Child3>
  </SUBLINE>
  <LINE xmlns="">
    <LINE_Child1>L2F1</LINE_Child1>
    <LINE_Child2>L2F2</LINE_Child2>
    <LINE_Child3>L2F3</LINE_Child3>
    <LINE_Child4></LINE_Child4>
  </LINE>
  <SUBLINE xmlns="">
    <SUBLINE_Child1>SL2F1</SUBLINE_Child1>
    <SUBLINE_Child2>SL2F2</SUBLINE_Child2>
    <SUBLINE_Child3></SUBLINE_Child3>
  </SUBLINE>
</Root>

-*-
Beware copy-pasting flat files' content to Logic App editor. You may loose some characters like the CR/LF and run into errors such as:

Decoding failure on copy-paste

For testing in a Logic App I recommend a simple setup with OneDrive file input-output. This avoid the complications some engineers have with http post tool, getting the right content encoding, etc.
Simple Logic App with Flat File decode

The trigger will be poll-based with a 3 minutes default but you can trigger it from the Azure portal on-demand each time you want to test a file or schema.