Understanding BAM Partitioning

Whenever you deploy a BAM Activity, the underlying database infrastructure is created for you automatically. Apart from the following five tables (and their associated views), bam_<ActivityName>_Active , bam_<ActivityName>_ActiveRelationships, bam_<ActivityName>_Completed ,bam_<ActivityName>_CompletedRelationships, bam_<ActivityName>_Continuations , you also get the following SSIS packages –

  • BAM_DM_<ActivityName> (this is the package that’s responsible for BAM data partitioning in the BAMPrimaryImport database and the movement of old data to the BAMArchive database)
  • BAM_AN_<ActivityName> (this optional package is created when your BAM views make use of OLAP cubes – not something we will discuss today).

Something that’s very easy to miss is that none of these SSIS packages run by default. You need to create a SQL Agent Job and schedule these packages to execute at well-thought out intervals. More on this later.

So, how does BAM partitioning work? Let’s say you have deployed a BAM activity and your Activity data gets inserted into the activity tables mentioned above. Over time, the bam_<ActivityName>_Completed and bam_<ActivityName>_CompletedRelationships tables would bloat up and as a result data inserts/reads would be slower. That’s where the BAM_DM_<ActivityName> SSIS package comes into the picture. Whenever it runs, it renames the bam_<ActivityName>_Completed table as bam_<ActivityName>_<GUID> and bam_<ActivityName>_CompletedRelationships table as bam_<ActivityName>_<GUID>_Relationships. Following this, it creates new copies of the bam_<ActivityName>_Completed and bam_<ActivityName>_CompletedRelationships tables with 0 rows each, so that insertion of new data is much faster.

Whenever we look at BAM data through the BAM portal, the BAM data is queried through the BAM views (the tables are not queried directly). So, after the creation of these two new partition tables, the BAM view definition is updated by the BAM_DM_<ActivityName> SSIS package, so that all the data that is present in the BAMPrimaryImport database for this activity is viewable/queryable.

So far so good. So when does this data move to the BAMArchive database?

An administrator can specify the time window for archiving activity instance data in the BAMPrimaryImport database. The OnlineWindowTimeUnit (default – MONTH) and OnlineWindowTimeLength (default – 6) properties for a deployed BAM Activity in the BAM_Metadata_Activities table in the BAMPrimaryImport database controls when activity data is archived to the BAMArchive database.

As per the default settings of the above properties, BAM data is retained in the BAMPrimaryImport database for a minimum of 6 months. But, don’t let the above statement mislead you. The movement of data is also dependent on how often you are running the BAM_DM_<ActivityName> SSIS package. Let’s say you have scheduled your BAM_DM_<ActivityName> SSIS package to run once per month. On the sixth month, you should have 6 sets of partition tables for the activity. However, only on the seventh month would these partitions be moved to the BAMArchive database. The reason is that the package maintains the time a partition was created in the bam_Metadata_Partitions table –

 So, after the creation of the new partitions, the BAM_DM_<ActivityName> SSIS package checks in the above table if there are any partitions (belonging to this activity) that can be archived to the BAMArchive database. This check is based on the CreationTime value that was stamped during partition creation time. So, only on the seventh month when our package runs, it would see that the first created partition falls outside the Online Window (i.e., older than 6 months) and hence, the first partition (that was created in the first month) gets archived to the BAMArchive database. This archived data would no longer show up in the BAM Portal.

Please note that manually updating the OnlineWindow for the BAM partitions is not supported. It must be done through the bm.exe tool. For example, to set the OnlineWindow to 6 days you can run this -

bm.exe set-activitywindow -Activity:<ActivityName> -TimeLength:6 -TimeUnit:Day

Ref : https://msdn.microsoft.com/en-us/library/bb246031.aspx

The above command would cause all BAM partition tables for this activity that are older than 6 days to be moved to the BAMArchive database. Note that this movement to BAMArchive would only happen the next time the SSIS package runs – if the package is scheduled to run daily, the old partitions would be moved on the seventh day. If however the package is scheduled monthly, these partitions stay back till the end of the month.

Another question that people have is how often should you be running these packages? It would really depend on the volume of your BAM data. For most cases, it should be good enough to schedule the BAM_DM_<ActivityName> package once a month. But, for high volume scenarios (eg: million rows a day), it would be advisable to schedule the job more frequently – say once every week or so. In the high volume scenario, you should also reduce the OnlineWindow from the default of 6 months to lower (say 3 months or so) – or else, you would have too many partitions for the activity in the BAMPrimaryImport database. This would lead to query timeouts and general slowness when your BAM queries execute.

There’s a catch though. If you are running this SSIS job too often (eg: each day) and the OnlineWindow has not been reduced from the default of 6 months, you will run into issues. The package would error out with the following –

Executing the query "EXEC [dbo].[bam_Metadata_SpawnPartition] N'MyActivity..." failed with the following error: "RegenerateViews_ViewDefinitionTooLong".

Whenever you get this error, it’s an indication that the SSIS package has been scheduled far too frequently (and the OnlineWindow hasn’t been reduced accordingly). If you can recollect, in the earlier part of the article, we discussed that when you query data from the BAM Portal, you don’t query the tables directly, rather you query the views that are a composition over all of the partition tables and the currently active tables for the activity. So, each time the SSIS package creates a new partition, it updates the view definition to include the new partition table (it uses a UNION to do this). When this view definition string gets too big due to the fact that we have accumulated too many partition tables(as the package was running too often) you would run into this issue. So, you would have to be judicious in scheduling this package. Once a month is good enough for normal BAM volume. If you need to run this job every day, you would then have to reduce the OnlineWindow to offset the creation of too many partition tables.

Finally, this package may also fail if your BAM view(s) is using OLAP cubes and you have not scheduled the BAM_AN_<ActivityName> SSIS package. Typically, you should schedule this package to run once a day so that the cubes can be updated on a daily basis. The [bam_Metadata_AnalysisTasks] contains the details per Activity of when the BAM_AN package last ran and which activity rows in the Activity tables have been processed so far by this job. If this job has not been scheduled, then no partitions would again move to the BAMArchive database, since we are yet to update the cube with the data in the Activity tables. You would still get the same RegenerateViews_ViewDefinitionTooLong error (as the view definition would have become very lengthy due to no partitions getting archived), so you need to watch out for this special case.
 

Written By
Arindam Paul Roy

Reviewed By
Jainath V R 

Microsoft India GTSC