Windows Azure provides several options for storing, processing and retrieving data in structured, semi-structured and unstructured layouts. In this series, I'll cover various options for implementing these options.
In this entry, I'll cover the SQL Server Fast-Track Data Warehouse, and show an example of it's implementation. This is a Relational Database Management System (RDBMS), which provides full ACID (http://en.wikipedia.org/wiki/ACID) compliance for when you need transaction-level consistency at the highest level.
Within the RDBMs family, there are various implementation options to allow the system to perform at peak performance. RDBMS servers excel at On-Line Transaction Processing (OLTP) (http://en.wikipedia.org/wiki/OLTP) workloads using a highly normalized data structure. They are also often used in On-Line Analytical Processing (OLAP, or more commonly called Business Intelligence) (http://en.wikipedia.org/wiki/OLAP) applications where the data structure is more de-normalized, with more repeated data, more aggregations and performance tuned for real-time queries rather than more constant updates.
OLAP implementations often further segment into the type of audience they serve. At the highest aggregation level sits the "Enterprise Data warehouse" which usually contains all (or most all) of the data within an organization, but at a fairly rough grain of aggregation. For instance, it might contain all sales from all regions for all time, but in a yearly or quarterly breakdown. The next level of granularity is the Data Warehouse. In this implementation, there is often more detail within the aggregates, and the Data Warehouse most often serves a more specific audience. Note that in some companies, the terms "Data Warehouse" and "Enterprise Data Warehouse" are used interchangeably.
Microsoft SQL Server contains engines, components and features to be used as either an OLTP or an OLAP system (or both, which is not advisable) but there are settings and configurations and hardware that lend themselves best to one mode or another. In the past Microsoft has teamed up with hardware vendors to create a Data Warehouse Appliance, a blend of tuned hardware and specific settings in SQL Server to run a Data Warehouse workload for high performance. You can now download a whitepaper and instructions to create your own high-performance Data Warehouse system, called a "Fast Track Data Warehouse" or FTDW.
Microsoft has now introduced an image into the Windows Azure Virtual Machine Gallery that is pre-optimized for a FTDW workload. In three steps and in less than 20 minutes, you can have a system that is a charged-by-the-hour license for development and testing or full production.
To begin, you'll need either a production account, a free account, or your MSDN benefits. See the http://windowsazure.com site for more information on using one or all of these types of accounts. From there, the process has three steps:
- Access the Gallery and select the Data Warehouse image
- Attach the proper number of disks to the image
- Log into the image using the RDP protocol (Remote Desktop) whereupon the system will configure the image with the proper settings and disks
To start, I've logged in to my production account on the Windows Azure Portal site.
From there I select the large "plus-sign" icon at the bottom right of the page, then "New virtual Machine", then "From Gallery". I select "SQL Server 2012 SP1 for data warehousing on WS 2012 R2" from the menu that appears.
Clicking the arrow at the bottom of that panel allows me to assign a name, a size, and an administrative account and password to the image. Microsoft recommends an "A6" VM size for production workloads, but I'll select "Small" here for my test system. It's important to note the size - there is a chart located here (http://i.msdn.microsoft.com/dynimg/IC686301.gif)
Clicking the arrow brings me to the screen where I can select the Cloud Service, Storage Account, and Availability Group (more on those here ) for the deployment.
The next panel lets me set the RDP port redirect so that I can log into the system, and also the port for PowerShell for remoting. I take the defaults.
Now the system starts the build process for the Virtual Machine.
At the bottom of the screen I click the "Attach" button.
And that's it. In less than 20 minutes I have a full data warehouse system ready for use.
Fast-Track Data Warehouse reference document: http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/reference-architecture.aspx
FTDW document for Windows Azure: http://msdn.microsoft.com/en-us/library/dn387396.aspx