Is your DataWarehouse a success?

How do you measure Datawarehouse (DW) effectiveness?

How do you know your DW is a success?

What are metrics to measure a DW?

This post outlines my thoughts around answers to above question:

There are two components to DW or also commonly called as Business Intelligence (BI) teams.

1) DW/BI Platform ( RDBMS, MPP systems, OLAP Cubes servers, Data Pipeline Servers, Reporting)

These are infrastructure or platform on which DW and BI systems are build.

3 possible metrics for Platform could be:

  1. A) Weekly Data processed: To show scale of the platform
  2. B) Weekly cost per TB of Data Processed: To show efficiency of the platform.
  3. C) Availability: Platform is available X % in a week.

2) DW/BI Engineering:

This area focuses on  building actual DW and BI systems and are difficult to measure.

I think there are 3 good metrics for DW/BI:

a) Return on Investment (ROI):

Does customers of DW/BI feel they can make decisions using DW/BI. This one is difficult to measure & usually DW/BI teams take more practical approach of customer engagement, where depending on how many teams in a company are using DW/BI in last 6 months or how many Active users access DW/BI weekly.

Active users are users who do reads, or writes with ADW, RedShift or Cubes or Cognos reporting etc.

For example: -We have X number of distinct teams in company using DW/BI -We have X number of Weekly Active Users

-We have X number of all-time users.

b) Service Level Agreement (SLA):

This is an important measure for customers of DW/BI. Each DW/BI customer like Business teams, Engg. teams, Data Science teams, Experimentation teams want data to be available post transformations by certain time of each day.

-We have 99%  or X % SLA meet in DW/BI in AWS environment. SLA is usually 8 AM PST.

c)Time to Insights (TTI):

Another very critical measure for DW/BI. If data retrieval from DW is not easy or takes hours or days (from Transaction DBs to Actual Facts/Dims in DW) than DW/BI is not performant. - We process each days of data in ~X hours today, so data is available by 8 AM PST - We support more than X hourly refresh cycles in DW for important use cases - Users can get insights from DW in minutes. - Users can get insights from OLAP Cubes (Self-serve) in seconds.​

Hope it helps, do comment and share thoughts.