How do you measure Datawarehouse (DW) effectiveness?
How do you know your DW is a success?
What are metrics to measure a DW?
This post outlines my thoughts around answers to above question:
There are two components to DW or also commonly called as Business Intelligence (BI) teams.
1) DW/BI Platform ( RDBMS, MPP systems, OLAP Cubes servers, Data Pipeline Servers, Reporting)
These are infrastructure or platform on which DW and BI systems are build.
3 possible metrics for Platform could be:
- A) Weekly Data processed: To show scale of the platform
- B) Weekly cost per TB of Data Processed: To show efficiency of the platform.
- C) Availability: Platform is available X % in a week.
2) DW/BI Engineering:
This area focuses on building actual DW and BI systems and are difficult to measure.
I think there are 3 good metrics for DW/BI:
a) Return on Investment (ROI):
Does customers of DW/BI feel they can make decisions using DW/BI. This one is difficult to measure & usually DW/BI teams take more practical approach of customer engagement, where depending on how many teams in a company are using DW/BI in last 6 months or how many Active users access DW/BI weekly.
Active users are users who do reads, or writes with ADW, RedShift or Cubes or Cognos reporting etc.
For example: -We have X number of distinct teams in company using DW/BI -We have X number of Weekly Active Users
-We have X number of all-time users.
b) Service Level Agreement (SLA):
This is an important measure for customers of DW/BI. Each DW/BI customer like Business teams, Engg. teams, Data Science teams, Experimentation teams want data to be available post transformations by certain time of each day.
-We have 99% or X % SLA meet in DW/BI in AWS environment. SLA is usually 8 AM PST.
c)Time to Insights (TTI):
Another very critical measure for DW/BI. If data retrieval from DW is not easy or takes hours or days (from Transaction DBs to Actual Facts/Dims in DW) than DW/BI is not performant. - We process each days of data in ~X hours today, so data is available by 8 AM PST - We support more than X hourly refresh cycles in DW for important use cases - Users can get insights from DW in minutes. - Users can get insights from OLAP Cubes (Self-serve) in seconds.
Hope it helps, do comment and share thoughts.