From this series of blog I will try to explain the concept of Big Data. So here are my views on the concept and platform supporting it. So before I start talk about the concept and technologies, let’s think for a minute about data around ourselves.
On professional front we may think about the data growing in our company. GigaByte -> TeraByte -> PetaByte -> ExaByte -> ZettaByte -> YottaByte. We may have appropriate policy to handle the data as it grow for example once first year completes data move to data warehouse and pushed to SQL Analysis services cube for analytics. And then after 4/5 years data will archive means it’s not required for analytics by the user. However think about one day someone ask to analysis/mining the whole data or what else can predict with adding more data to data mining which we use to ignore earlier. The question is how we will handle the situation where no available technology (in company) is capable of handling such huge data?
Another scenario, we may not able to visualize the huge data since may be our job profile doesn’t allow to deal with it. So one of the ways is to think about social networking data. We in our day to day life, log in Facebook site and upload some content. I read somewhere that Facebook host approximately 10 billion photos, which takes about one petabyte of storage. Think about data storage and manageability of it by such site. Bing could be another example.
Now we get the idea of the volume of data we are talking about. That’s the one aspect (Volume) of Big Data other two aspects are Variety and Velocity. Variety of data means different types of data industry generating. It could be anything Structured, semi-structured, Unstructured. Velocity mean by the speed data is generating. For example data generated by, mobile devices, website clicks etc. This is as per the definition and we’ll find three V’s everywhere.
However according to me there should be forth V as well which is very important that is Value. We have to think about what value I am going to get. I mean the outcome or ROI of the implementation of BIG Data platform. Think about what outcome/benefit we can provide with this platform, how it’s going to be different, what challenges we are trying to solve.
OK. Now we understand the Big Data concept. Next question is How to handle it. Hadoop is the technology which will not only help with storage but also in querying the data. So at high level overview Hadoop has two layer 1) Storage Layer 2) Query Layer.
*Disclaimer: – The opinions expressed herein are my own personal opinions and do not represent my employer’s view in anyway.