Exploring Big Data: Course 5 – Delivering a Data Warehouse in the Cloud

Big Data with Sam Lester

 

(This is course #5 of my review of the Microsoft Professional Program in Big Data)

Course #5 of 10 – Delivering a Data Warehouse in the Cloud

Overview: The Big Data course titled “Delivering a Data Warehouse in the Cloud” walks you through the key concepts of a SQL Data Warehouse (DW) in Azure, including the steps to provision a DW, followed by lectures on designing tables and loading data, and completes with big data integration with Hadoop using Polybase. During the course, the four lab exercises require you to install numerous software applications used in a data warehouse environment, including SQL Server Management Studio (SSMS), SQL Server Data Tools (SSDT), Visual Studio Community, Visual Studio Code, Azure Feature Pack for SSIS, and Azure Storage Explorer. The download and installation of these tools is part of the lab exercise as opposed to the course providing a pre-built VM with the required software. As a result, you can’t complete labs #2-4 without going through this setup. I would have preferred an image with the required software since I’m very familiar with installing and configuring each of these applications. For those who don’t have experience with some of these tools, the course is a great way to walk through installation and basic functionality of data warehouse tools in addition to the Azure DW content.

Time Spent / Level of Effort: I spent about 10 hours total for this course. I watched the videos from part 1 on double speed and then finished the quiz from that section. At that point, I decided to do all four of the labs consecutively. This took me around 2.5-3 hours, but I felt like it was a great use of time to do them back-to-back since I could focus on everything in Azure, including the numerous tools introduced. After completing the labs, I went back to the videos to watch parts 2-4, followed by the quizzes, and then the final exam.

By completing the labs all at one time, I was able to minimize Azure costs by shutting down the VM when finished. Here is the resource usage for the SQL Data Warehouse VM while working through the labs.

Azure Data Warehouse with Sam Lester

Course Highlight: It felt like completing the labs took an exceptionally long time due to the required installation step, but the highlight after finishing the course is that I have a great Azure DW demo environment to continue to use for demos and presentations. I also enjoyed the videos on Polybase since I haven’t had a chance to explore this for a customer related project to date. To me, this chance to watch videos and build solutions around popular topics (such as Polybase) while going through the program is a huge benefit to help me remain relevant with so many interesting areas of the data industry. The other aspect of the labs that I enjoyed was the process of executing the same step through two different tools. For example, uploading data through bcp and Azure Storage Explorer as well as running TSQL through both Visual Studio and SSMS. 

Suggestions: The final exam for this course was harder than any of the previous 15 edx courses I have taken as it didn’t feel like the videos and labs prepared you directly for the questions. I found most of the answers when reading documentation and trying out the scenarios in the lab environment. Don’t forget to fill out the final survey / question after completing the course as this contributes to your score. Also notice that I took course #5 directly after completing course #1 since there is no requirement for the ordering of the classes as long as you can stay within the required schedule.

There are also a few small items that I encountered in the labs where the documentation is incorrect in areas where the product functionality has been updated. One example is in populating a table using Azure Data Factory in Lab 3. The default Data Factory version is now V2, but the lab instructions to use the Copy Data functionality are available in version 1 (V1). Creating the Data Factory using Version 1 allows you to continue with the lab as documented.

Azure Data Factory

There is another small issue when using TSQL with Polybase to load data. The provided TSQL code begins with the line "CREATE MASTER KEY;". Since this already exists, the script fails. You can work around this by removing this single line and continue with the lab.

Overall, it was a very educational course that covered a lot of material and introduced several software applications used in a data warehouse environment. If you have taken this course in the past or are going through it now, please leave a comment and share your experience.

Thanks,
Sam Lester (MSFT)