Data Cubes

The term “Cube” is not new. Most of us would associate it with Rubik cubes, invented by the Hungarian sculptor and professor of architecture in 1974. It was so popular that in 1980 Germany recognized it as “German game of the year”

 

Children or the childlike would associate it with dice. A six faced colored dots which would decide the outcome of a game like snake & ladder, the game of Business, Ludo and the likes.

 

Data Cubes are more like the geometrical cubes (or Cuboids) having multiple dimensions. It contains dimensions and Facts in which data can be analyzed. Cubes provide a number of built-in functions which makes it easier to cater to calculations required to analyze data. Multi-Dimensional eXpressions (MDX) is used to define calculations.

Key Performance Indicators (KPI) are also defined using MDX in which Goals, Achievements and Trends are defined. KPIs are grouped together in a scorecard.

Processing a Cube will pull data from the relational model into OLAP model. How much of data gets stored into the Cube depends on what is chosen to strike a balance between Performance, Processing times and Storage. The alternative are

MOLAP – Multi dimensional OLAP – Data and aggregates are stored within the Cube. This gives the best performance while it uses maximum storage. It can answer queries even when the underlying RDBMS is not available. In order to minimize the downtime of the Cube when data is being refreshed, lazy aggregations was introduced. Cube becomes available faster and continues to build aggregations till it is completely built.  

ROLAP – Relational OLAP relies on the underlying relational database to answer queries. Cube only stores metadata on how to access tables underneath. It can’t answer any queries without the underlying RDBMS and takes the least amount of storage space. At query time it needs to fetch details and aggregate on the fly and thus can take a toll for each query executed.

HOLAP – Hybrid OLAP is a combination of the two where aggregates are stored in the Cube and details needs to be fetched. The storage space and performance of queries is also mid-way between the above two.

A combination of all 3 can be used in a single Cube! A cube can be partitioned just as SQL tables are partitioned. Each partition can have a different aggregation strategy. Generally the lesser queried partitions (lets say data 2 years old) are in ROLAP, the most frequently queried current data is in MOLAP.

Partitions can also be stored remotely providing a way to scale out Cubes. This also helps in load balancing by running distributed queries.

These cubes can be sliced and diced, drilled down and rolled up to analyze data across ‘n’ dimensions.

These cubes can act as a data source for a number of reporting clients like SSRS, Excel and third party components such as Microstrategy and Business Objects.

In SQL Server 2012, Visual Studio is used to create the Cubes. You can find the tutorials on msdn.

Build your own Cube and share your experiences with me. Cubes are a lot of fun and at the same time a very powerful feature of data exploration. Happy building Cubes