Processing Analysis Services. Notes on the Care and Feeding of an OLAP Database

Anyone who has had the opportunity to work with Microsoft Analysis Services, in multidimensional mode, knows that processing is a series of steps that are used to load data from a relational data source into a model. By default, processing occurs when a solution is deployed or re-deployed to the server from BI Development Studio or SQL Server Data Tools. Processing can also be accomplished using SQL Server Management Studio by using SQL Server Integration Services and SQL Server Agent. What actually happens during processing is dependent on several factors that need to be carefully considered. Structural changes to models, dimension design characteristics, processing options selected, configuration settings, and other executing operations are all important factors that can impact processing performance and the outcome of processing operations.

Structural Changes to a model

If there have been structural changes to a model, it is going to be necessary to process any objects that are impacted by the structural change. Just to be clear, a structural change is going to result in changes to the metadata as well as the structure of the data and aggregations.

  • Adding a dimension to a database requires processing the newly added dimension.
  • Adding a dimension to a cube necessitates processing the dimension and any cubes to which it has been added.
  • Removing a dimension from a cube or database requires reprocessing the cubes in which the dimension was used
  • Changing the compatibility level of a dimension requires that the dimension and any cubes in which it is used be reprocessed

Dimension Design Characteristics

The way in which attribute relationships are defined within a dimension has a very distinct impact on processing aggregations. By default, all attributes are directly related to the granularity attribute using flexible attribute relationships. That essentially means that members are expected to change positions and potentially levels within the structure of a dimension, which results in creation of flexible attribute relationships. Flexible attribute relationships are fine in situations where members of a dimension are likely to change positions, but they can also result in some performance issues. Executing a ProcessUpdate on any dimensions that are designed with flexible attribute relationships will result in flexible aggregations associated with that dimension being dropped. That results in some performance degradation because without the flexible aggregations, the server is forced to scan partitions rather than use the functional equivalent of indexes. If it is necessary to use flexible attribute relationships, consider the following options:

  • When executing ProcessUpdate, check the “Process affected objects” checkbox, which will process the objects in a single transaction and rebuild the flexible aggregations. This may take somewhat longer for processing to complete.

  • After executing ProcessUpdate, execute a ProcessIndex on the measure group(s) that use the affected dimension.

  • Configure the ProcessingMode property of measure groups that use the dimensions with flexible attribute relationships for LazyAggregation.

    Processing indexes - Adjust AggregationMemoryLimitMin and AggregationMemoryLimitMax

Processing Options

There are a number of processing options available and not all options apply to all objects. Another consideration is that different processing operations apply to different objects stored in an Analysis Services database, as outlined in the table below.

Processing Type

Cubes

Databases

Dimensions

Measure Groups

Partitions

Mining Models

Mining Structures

Add

 

 

X

X

X

 

 

Clear

X

X

X

X

X

X

X

Data

X

 

X

X

X

 

 

Default

X

X

X

X

X

X

X

Full

X

X

X

X

X

X

X

Index

X

 

X

X

X

 

 

Structure

X

 

 

 

 

 

X

Update

 

 

X

 

 

 

 

Process Add  

  • For dimensions, adds new members and updates dimension attribute captions and descriptions.
  • For measure groups and partitions, adds newly available fact data and process only to the relevant partitions.

Process Clear

Drops the data in the object specified and any lower-level objects. Executing a ProcessClear at the database level would drop the data for all dimensions, cubes, measure groups, partitions, and mining models contained within the database. Executing a ProcessClear on a cube would remove all data from the measure groups and partitions contained in the cube.   

Process Data  

Drops the data in partitions and re-populates with data from the data source. Using this processing option does not build any aggregations or indexes..

Process Default

This processing option detects the state of the database object or objects that are to be processed and performs any actions necessary to render the object(s) to a fully processed state. In the event that a structural change has been made to a database or the database, cube, etc. is unprocessed, ProcessDefault will execute a ProcessFull on the object. For example, if a partition were changed from a table binding to query binding, ProcessDefault would execute a ProcessFull on the partition. Likewise, if a ProcessDefault were executed on an unchanged partition, the action would process data and indexes on the partition. .

Process Full  

Processes an Analysis Services object and all the objects that it contains. When Process Full is executed against an object that has already been processed, Analysis Services drops all data in the object, and then processes the object. This kind of processing is required when a structural change has been made to an object, for example, when an attribute hierarchy is added, deleted, or renamed.

Process Index

This processing creates or rebuilds indexes and aggregations for all partitions contained by the object being processed. When executing a ProcessIndex on a cube, aggregations an indexes are built for all of the partitions contained within that cube. Likewise, when a ProcessIndex is executed on a Measure Group, aggregations and indexes are built for all of the partitions contained within that measure group. I probably should point out that this option is not valid for objects that are in an unprocessed state, so attempting to execute a ProcessIndex on a cube that is in an unprocessed state will generate an error. Creates or rebuilds indexes and aggregations for all processed partitions. If a cube contains one or more dimensions built with flexible attribute relationships and the cube is not configured for Lazy  Processing, executing a ProcessIndex is an action that should be seriously considered after a ProcessUpdate on a dimension that contains flexible attribute relationships.

Process Update

This processing option forces the server to re-read the source data for the dimension and update of dimension attributes. It's important to bear in mind that executing a ProcessUpdate on dimensions that have been built with flexible attribute relationships will cause all of the flexible aggregations and indexes on related partitions to be dropped. This can result in potential query performance issues and it's often desirable to either check the “Process affected objects” checkbox or seriously consider executing a ProcessIndex on the affected cube(s) after executing the ProcessUpdate.

Process Structure

ProcessStructure  really only applies to Cubes and Mining Structures. Applied to a Cube, If the cube is in an unprocessed state, the server will process the metadata for the cube and all of the dimensions that are used within the structure of the cube. Applied to Mining model structures, ProcessStructure populates the mining structure with source data but does not actually load data into the mining models.

ErrorConfiguration Settings 

These configuration settings determine the actions taken when errors are encountered during processing. Using the default error configuration, processing will stop on an error. One can, however, select the "Use custom error configuration" option, which allows selection of values for the following actions to control error-handling behavior:

Calculation Error. This setting determines the action to be taken when an error is encountered in the calculation script. The options are:

  • Ignore Error, which is the default action
  • ReportAndContinue
  • ReportAndStop

KeyDuplicate. This configuration option determines what action is taken when a duplicate key is encountered in a dimension.The options are:

  • Ignore Error. This is an option, which I generally consider to be very undesirable because it tends to mask referential integrity and data quality issues in the underlying relational data source.
  • ReportAndContinue
  • ReportAndStop. This is the default option and is very commonly implicated in processing failures, although the underlying problem is a data quality issue in the relational source database.

KeyErrorAction. If a key value does not yet exist in a record, one of these actions is selected to occur:

  • Convert to unknown. The key is interpreted as an unknown member. This is the default setting.
  • Discard record.

KeyErrorLimit. This is an integer value that sets the maximum number of key errors that can occur before the Key Error action is invoked.

KeyErrorLimitAction. This setting determines the action that is taken when the number of key errors reaches the value of the KeyErrorLimit configuration setting. The options are:

  • StopProcessing
  • StopLogging

KeyErrorLogFile. This is very simply a path to a text file to which errors will be logged.

KeyNotFound. This setting determines the action that Is taken when a key is not found. The options are:

  • Ignore Error. This is probably acceptable for a development environment, but is probably not a good selection for a production environment due to the potential for masking data quality issues.
  • ReportAndContinue
  • ReportAndStop

NullKeyConvertedToUnknown. This setting determines the action taken when a null key is converted to "Unknown". The default is to Ignore the error. The options are:

  • Ignore Error
  • ReportAndContinue
  • ReportAndStop

NullKeyNotAllowed. This setting determines the action taken when a null key is encountered during processing. The default is to Ignore the error. The options are:

  • Ignore Error
  • ReportAndContinue
  • ReportAndStop

 

Other Executing Operations

For one reason or other, operations that are being executed by a server are frequently overlooked with respect to processing. Unfortunately, this can sometimes result in deadlocks and processing failures. Deadlocks can occur in a number of different scenarios. One of the most common scenarios that results in processing failure is related to attempts to process objects in two separate but concurrent transactions. The problem in this scenario is that Analysis Services uses a two phase commit process during processing. As part of the commit, the server must update the Master.VMP file, which contains the master version map for all objects on the server. Updating that file requires a server level lock and when two objects are being processed in parallel and both are ready to commit the transaction, both must update the Master.VMP file. This results in one of the processing transactions becoming a deadlock victim and the transaction being rolled back. In this scenario, the best option to consider is to use the DMV query SELECT * FROM $SYSTEM.DISCOVER_SESSIONS and examine the SESSION_LAST_COMMAND column to check for the presence of a processing command before initiating a processing operation on another object and execute the processing operations in sequence rather than in parallel.