Big data, Big deal

Today, more data is being produced every year than the entire history of human civilization since 2000. How does one Analyze and Visualize this data? Apart from the Volumes, the Variety, the Velocity with which it flows in is paramount. This “VVV” data has to be managed, enriched and an insight gained into it.

ERP, SCM, CRM are classic examples of highly structured data stored in an RDBMS such as SQL Server. Web logs, Social interactions and feeds contributed to data volumes moving from gigabytes to terabytes. RFID, GPS navigation, aircraft information moved this data to Petabytes

 

 

The debate whether relational or non-relational store is right for my data store is gradually dying. Both are designed to meet unique needs and required in different scenarios. When data is structured relational data stores are ideal for storing and retrieving data using a simple language like SQL. In contrast non-relational stores are suited for non-structured data while analysis is carried out programmatically. Modern data platforms must support both types of data equally well.

Big data needs to be managed, enriched and an insight gained into it in order to make use of this huge data available

Data Management: The need is to monitor and manage relational, non-relational and streaming data without having to worry about scale, performance, security and availability. SQL Server engine support structured data (Including PDW for scale). Unstructured data is supported in HDinsights which is 100 % open source implementation of Apache Hadoop with a commitment to give back to the Hadoop community.

Advantages of Hadoop on Windows

  • HDInsights will bring in robustness, manageability, simplicity of Windows to the BigData world.
  • Security through active directory,
  • System center integration simplifies manageability which reduces setup time deployment.
  • HDInsights on Azure will further lower barrier to deployment by offering web portals to manage and configure Hadoop clusters
  • Can be deployed on cloud or on premise
  • More applications can be quickly built up using SQL language along with Java script and .NET

 

Third party providers helping further opening up possibilities with Bigdata

  • Karmasphere – provides a graphical environment on Hadoop to spot trends and patterns.
  • Datameer – BI platform for Hadoop
  • Hstreaming – Complex event processing and real time analytics

Data Enrichment

Data Discovery is made easy through data recommendation. Azure data market place recommends data sets that adds value to choices being made. Example: if analyst picks up Customer dataset, Dunn and Bradstreet is recommended which has credit information. You can connect and combine data from hundreds of trusted data providers. Example: US census bureau.

This helps in combining personal data with organizational data with community and finally world data to enhance value.

Data Cleansing is offered through SQL Server Integration Services (SSIS), Data Quality Services (DQS) and data governance through Master Data Services (MDS). Predictive analysis is possible through mining algorithms in SSAS.

For advanced analytics integration is provided with Mahout and R

Data Insights

Visualization and Analysis offered by PowerView and PowerPivot along with SSRS, PPS and Microsoft office shared through sharepoint collaboration offers excellent visualization

With big data gaining in a big way, offerings are available with rapidly increasing features