BAM tracking data gone missing

Recently, I worked on a BAM issue and I wanted to share with you all some interesting facts which I found while researching on this.

Customer implemented BAM Activities with Event Streams. They were using OrchestrationEventStreams within Orchestrations.

This is how their flow looks like in short:

Orch1:  Start Orch1 –> BeginActivity –> UpdateActivity –> Enable Continuation –> Send message to Orch2 so that it gets instantiated and passing Continuation Token to it --> End Activity
Orch2:  Start Orch2 --> Update Activity with Continuation Token  --> End Activity with Continuation Token

Customer observed that sometimes there is a row in the BAM Active table with ActivityId set to ContinuationToken and IsVisible set to NULL. This tracked BAM data is sitting idle in the active table and would not go to Completed table. Also, since the IsVisible is set to NULL, it would mean that it is not going to be shown through BAM view or BAM Portal, unless you write a custom query.

Researching on this, I found that if IsVisible is NULL, this would mean that BeginActivity was never called. So, what was happening in the customer case that that BAM events from Orch2 were getting processed and reaching BAMPrimaryImportDB while the BAM events from Orch1 never make up or they got stuck somewhere in between. Now, the question was how come the events from Orch1 were not getting processed though the events from Orch2, which got instantiated after Orch1, were getting processed. Also, keeping in mind, that the issue is intermittent.

Lets now understand how do the OrchesetrationEventStream(OES) API works. OES API’s are asynchronous. This means that API stores tracking data first in the BizTalk MessageBox database. Periodically the data is processed and persisted to the BAMPrimaryImport database by the Tracking Data Decode Service (TDDS).  There are four tables inside the MessageBox database which stores BAM Tracking data before it gets moved to BAMPrimaryImportDB i.e trackingdata_0_0, trackingdata_0_1, trackingdata_0_2, trackingdata_0_3 (note: the other four tables trackingdata_1_x store the tracking data for BiztalkDTADB). For a particular Orchestration instance, OES uses the Orchestration ID as the StreamID and all the events are written to the same table by the OES and TDDS can guarantee that the events are processed in the same order. https://msdn.microsoft.com/en-us/library/microsoft.biztalk.bam.eventobservation.bufferedeventstream.streamid.aspx

But since we have two different Orchestrations here, it would mean that all the BAM tracking data could go to different Tracking tables inside MsgBox and then we cannot guarantee that it would be processed in sequence. But since we are using Continuations, BAM guarantee that the end result after all the events gets processed (no matter what the sequence is), we should see correct result in the BAM Completed table. That’s the magic with BAM.

Now we queried on the four trackingdata tables inside MsgBox DB and found that the table trackingdata_0_0 has more than 50000 rows which keeps moving in the upward fashion and never decreased. It seems that somehow BAM data stored at this table is stuck there and TDDS is not able to move it to BAMPrimaryImport DB. It now clearly makes sense why we sometimes see BAM events from Orch2 inside BAMPrimaryImport DB and not from Orch1 as the events from Orch2 might have been stored at table other than trackingdata_0_0 and got successfully processed by TDDS. It also explains the intermittent nature of problem as other times the BAM events from Orch1 and Orch2 got written to table other than trackingdata_0_0. Also, sometimes both the events from Orch1 and Orch2 could get written to trackingdata_0_0 table and we would see no BAM data getting tracking in BAMPrimaryImport.

Later, on more digging, the issue was found external to Biztalk. The issue was all related to network and firewall between the Biztalk and SQL servers. After those issues were fixed, all data from trackingtable_0_0 table moved to the BAMPrimaryImportDB correctly.