Hello and welcome to this new tutorial series of Azure Data Services. Through the course of this tutorial, I will take you through what Azure has to offer in Data Services. I will start with the basics of relational databases, dig into the goodness of non-relational offerings and end with what is new and what is exciting in Azure Data Services.
Relational vs. Non-Relational
As a developer, most of the times we look into relational data bases and what it has to offer. What we often overlook is non relational databases (NoSQL) which can handle certain tasks more efficiently when compared to the relational counterpart. However, certain operations are definitely suited better to relational databases which offers a query rich environment. In essence, most real world cloud apps may need a hybrid data storage strategy.
The table above gives an overview of the various data storage options that are available with Azure. As you can see that there is one column which lists the relational data store versus four other columns which list the non-relational data stores.
Key/Value databases store a single serialized object for each key value which is great for storing large volumes of data. Azure Table is of this type where as Azure Blob is a key/value database where a key corresponds to a files or folders. Azure tables are generally used to store large volumes of data (for example: profiles of users) which is more scalable and less expensive than relational databases, however, it does not enable complex joins and queries.
Document Databases on the other hand are key/value databases where the values are “documents” which is essentially a collection of named fields and values. Document DBs have the additional capability of better queries on non-key fields. MogoDB is a popular document DB.
Column Family is again a key/value DB which allows you to structure data storage into collections of related columns called column families. Cassandra is a popular column family database.
Finally, graph databases enable efficient queries that traverses network of objects and the relationship between them. Neo4j is a popular graph database.
I-a-a-S vs. P-a-a-S
The other important aspect to note is that the offerings mentioned above include both Platform-As-A-Service and Infrastructure-As-A-Service.
P-a-a-S includes Azure SQL (relational), Azure Table (NoSQL) and Azure Blob (File storage in cloud). Where as I-a-a-S allows you to simply spin a VM with a an OS image and you can install the DBMS of your choice such as SQL Server, Oracle, MySQL, SQL Compact/Lite/ Postgres, etc.
The I-a-a-S option gives you unlimited data storage options as you can just spin a VM and install the software of your choice. Having said that, Azure gives you quite a large number of preconfigured images that have data management softwares installed such as MongoDB, Neo4J, Cassandra or CouchDB. To check out the various pre-configured images available, you can go to the management portal, click on virtual machines –> Images –> Browse VM Depot.
It is important to understand the advantages of using an I-a-a-S option over the P-a-a-S and vice-versa. Yes, I-a-a-S does give you a lot of options and gives you the flexibility to manage your system as per your need, but then you have to configure and manage your VMs end to end. In P-a-a-S, you do not have to do that. The service provider (Microsoft in this case) manages the hardware and software for you which includes setup, updates, software patches, scaling up, licenses and every other aspect of administration. This essentially abstracts the administration for the end user who just consumes the data base service provided.
Is there more?
Yes, there is. Some of the biggest buzz words in the industry today are the 4V: volume, variety, veracity and velocity of data which has given rise to Big Data. The high volumes stored in NoSQL databases may be difficult to analyze in a timely and efficient manner. In order to analyze this, techniques and frameworks such as MapReduce are used. MapReduce breaks down the data and sends them to different computers for processing. These computers together form a cluster. Hadoop incorporates this framework and calls these clusters as Hadoop clusters. On Azure, HDInsight enables you to process such data using the power of Hadoop. Possible use cases might be using Azure Storage and then use HDInsight to derive insights from the data stored which can be seamlessly integrated.
One of the latest additions to the gamut of Data Services from Azure is Redis Cache. It is based on the open source Redis Cache managed by Microsoft which any application within Azure can get access to. Data Factory on the other hand is a fully managed service for composing data storage, processing and movement services into streamlined, scalable and reliable data production pipelines. The information produced by these pipelines can be easily consumed using BI and analytical tools to derive key business insights. Note that these last 2 services were recently introduced and is available in the new management portal of Azure.
In this part I have introduced you to the wide array of services available in Azure. Starting from relational versus non relational offerings to understanding which service is IaaS or PaaS. We then looked into analytical offerings such as HDInsight, followed by few new services such as Redis Cache and Data Factory. Also it is important to note that Azure is a growing platform with services being added rapidly which implies what I described today is by no means exhaustive and static in the dynamic world of Azure.
In the next part, I will cover the relational offering from Azure: Azure SQL. Stay tuned and follow me @AdarshaDatta to share your experiences with Azure Data Services.