In this post I will outline the components used in BA v.4. This will enable you to get an overview of the underlying technologies and ultimately give you an idea of what it takes to customize the setup delivered by BA.
The major components of Business Analytics are:
- The Configurator which is the tool that consumes the XML containing the configuration chosen.
- A relational Database in SQL Server 2000
- A Set of DTS Packages
- A Database in Analysis Services 2000
In this post I will describe in short what DTS is and how it can be used. I’ll also point to further reading. I will also give a short introduction to Analysis Services and here too point to further reading on the subject.
Data Transformation Services (DTS)
When creating a small configuration and running it (I did a demo at the Product Conference this week that I’ll use as an example) you get 4 DTS Packages. A DTS Package contains what ever logic that you as the creator have decided to put in it. The four packages are:
- BA for MBS-NAvision 400 – Initialize
- BA for MBS-NAvision 400 – Transfer [CRONUS International Ltd.]
- BA for MBS-NAvision 400 – Post Process
- BA for MBS-NAvision 400 – Process Cubes
This setup is classic in its layout. First some initialization takes place. In this case tables are truncated. Then the actual transfer of the data is taking place. Then some time variables are set, integrity checks are done and finally the cubes are loaded with data.
There are two major online resources for using DTS in BI. One is Best Practices for Using DTS for Business Intelligence Solutions that in detail describes how best practices can be implemented. SQL Server Accelerator for Business Intelligence (SSABI) is more or less best practices implemented. How DTS is used in SSABI is described in this article.
To give a brief overview of what DTS is made up of it can be summed into the following:
Packages are containers where the logic is executed. A package can contain any Connection, Task, Global Variable and Work flow that are a part of DTS or developed as Custom Tasks etc. A Package can contain other Packages that can be executed and variables can be passed from outer to inner packages. In this way a certain amount of portability can created in DTS. Packages can be saved in SQL Server, as files and as VB projects from where they can be compiled into executables.
Global Variables in DTS is probably the one feature that I like the most about DTS in SQL 2000. It enables the making of packages that can be executed on several different Servers (Dev/Test/Prod) and allows packages and tasks to be configured at runtime and put into context at runtime. This allows for a high degree of reusability for Packages, Connections and Tasks. It is, however, also the one feature that I know of that has caused most errors produced by humans that do not set Variables correctly or have the set themselves from the environment where they are executed. Variables can be used to set Server names, Connection strings, SQL statements etc. at runtime and can be picked up from INI-files, repositories etc.
A Connection is just what it sounds like – a connection. DTS doesn’t know if it’s a connection used as a source or target. This means that we from a user perspective can expect the same functionality from all connections since they are all exposed in the same way. Connections can be made to any ODBC or OLE DB provider installed on the machine where the package is running.
Tasks in DTS can be split into two categories. Tasks that move or manipulate data in some way and tasks that perform other things like FTP or file system tasks. The first Task is the Transform Data Task which is the one used by BA. This Task can move data from one connection to another. Doing so it can manipulate data with one of the built-in data transformations or with a custom built one. This article describes how to use the Transform Data Task in detail. I realize that this is from Version 7 but it is still very useful. It’s also worth mentioning that the Transform Data Task will try to run bulk insert. This depends on several things like the transformation used. If ActiveX Scripting is chosen it will evaluate data on a row by row basis. The next task is the Execute SQL task. This task can execute any SQL statement against a Connection. The last task is the Data Driven Query. This is useful compared to the data pump since it can perform different actions depending on the outcome of the transformation. This means that it is useful in handling Changing dimension where in some cases data should be updated and others inserted. It is also worth mentioning that this task is slow since it evaluates data row by row and for each row executes a SQL Statement. This makes it even slower than the Transform Data Task. For detailed information on the speed have a look at this presentation given by Gert Drapers at SQL Pass.
Work flow is what happens between tasks. In other works this is the instrumentation of tasks. There are three different types. On Success, On Error and On Completion. Each of them is self explanatory and I won’t go into detail with each of them.
Scheduling of DTS Packages can be done in any scheduler that can execute a windows command line. This happens through the utility DTSRUN where a series of parameters can execute a Package together with providing values of any Global Variables in the package. To help generate this command line I recommend looking at the utility DTSRUNUI.
When creating the sample database mentioned before I end up by having one Database in Analysis Services with one Cube and some shared Dimensions.
The database contains a connection to my relational Database. This Database contains the tables specified in the Configurator that is run from inside Navision 4. These tables are, by the way, truncated and loaded from scratch each time the packages are run. This also means that the cube is loaded from scratch and aggregated again each time a load takes place.
With regards to Analysis Services there is also good reading to find in this article on SSABI. For a general intro to Analysis Services and the concepts of OLAP databases this article is worth reading. It goes into detail on the differences between OLTP and OLAP databases. For operations of Analysis Services the white paper Analysis Services Operations Guide is the resource to read. For optimizing performance the white paper Analysis Services Performance Guide is the resource to read.
a database is the container where cubes and dimensions reside. The database can be compared to a relational database where tables and other things reside. The same can be said about Cubes and dimensions in an Analysis Services Database.
Connections for analysis services is where data for a cube comes from. Connections can be made to any ODBC or OLE DB source (Please pay attention to the fact that there might be limited availability of drivers for the Itanium 64–bit). One cube can only get data from one connection. This means that it is not possible in Analysis Services 2000 to get data from more than one source for one cube without involving other tools like Linked Servers or DTS.
A cube is a collection of a data model (Star Schema, Snowflake or 3NF) with measures, dimensions, partitions and aggregations. A cube can come from not only one connection but also one fact table. This means that if one would like to combine more than one fact table say Actuals and Budget it would be impossible in a cube. For this we have Virtual Cubes. Virtual Cubes are like Views in a relational database. They are a logical representation of the underlying cubes. To be able to combine two cubes in a Virtual Cube there must be at least one shared dimension. This could be any dimension in the two cubes.
Shared & Private Dimensions
Dimensions can be shared or private. A shared dimension is available to all cubes in the database and is needed to create a virtual cube (see above). Private Dimensions are created in a cube and can only be used in the cube in which they were created. This doesn’t mean that a private dimension can’t be visible in a Virtual Cube but the behavior is different from a Shared Dimension. When selecting a member from Private Dimension values of measures from other cubes are not affected. Dimensions can come from a single table, a Snowflake and a Parent-Child table.
Security in Analysis Services can be created down to Cell level. This is comparable to row level security in a Relational Database. It is a built-in functionality of Analysis Services but there are several things to keep in mind when designing this. All performance considerations are discussed in the paper Analysis Services Performance Guide. For Operational implications of security these are discussed in Guide. On how to implement security Mosha has a great collection of information on this topic.
Analysis Services has some Data Mining functionality. I’m not an expert on Data Mining but the guys at www.sqlserverdatamining.com are.