Today is July 28th 2009 and I am writing this small post to share my understanding on Gemini and Business Intelligence arena and specifically around Analysis Service.
Talk about data in Microsoft and the very first thing that comes in mind is SQL Server. Microsoft hardly released SQL 2008 and we are looking forward for next version of SQL Server. That is very ambitious.
When it comes to Business Intelligence problem, things like Reporting, Analysis, data-mining come automatically forms part of the solution. But there are few patterns that should be considered for developing an efficient BI solution. Before talking about Analysis Service, let’s understand how operations are different in OLTP vs OLAP databases.
In OLTP databases, data is written in form of rows (relational databases). So for example, if I want to store two records of two column structure having (name and email as columns) the storage will go in a fashion:
“Rahul” “email@example.com“, “Sabiha” “firstname.lastname@example.org“.
Eeffectively, data is written over disk one row at a time. This makes typical OLTP transactions efficient since I can quickly read/write a all columns of ONE record somewhere in middle of 50 million rows. But when it comes to reading ALL records (let’s say email of all 50 Million records), relational databases fail miserably in terms of performance.
In OLAP databases, data is stored in a different fashion. It is stored in the form of columns rather that rows. So if I want to store the same set of records in columnar fashion, it would be stored this ways:
“Rahul” “Sabiha”, “email@example.com” “firstname.lastname@example.org“.
The benefit and drawbacks are here.
If I want to read columns of ONE record somewhere in middle of 50 million rows, it will fail miserably in comparison to relational databases. But when it comes to reading a column of ALL records, this will be surprisingly fast (and this requirement is very common in Analysis: to read large number of records rather than one record).
So one thing is clear that columnar databases are better than relational databases for data analysis.
Another thing that need to be considered while designing analysis solution is the storage medium. Databases that use disk as the storage medium will perform very slow for performing analysis algorithms than the databases that use in memory medium as storage. Storing data in memory is fast because of the medium speed as well as saving CPU cycles. Although the durability property of ACID is lost in this process (since memory is volatile) but this still holds a good candidate for analysis solutions. Even hybrid storage mechanisms can be considered depending on the requirement scenario.
Now comes Microsoft’s project codename “Gemini”. Gemini is next version of Analysis Service based on the concepts of using columnar database and in memory storage, making analysis of data hell fast. Gemini will be shipped in the next version of SQL Server (2008 R2/Kilimanjaro).
The good thing is that Excel users can use its capability by installing a plug-in. Hence the term “self service analysis”. Every user in an organization will be able to analyse data and take decisions using simple Excel. More here:
I hope this gives a peek about what Gemini is and how it will work.