Exploring Big Data: Course 10 – Microsoft Professional Capstone: Big Data

Big Data with Sam Lester

(This is course #10 of my review of the Microsoft Professional Program in Big Data)

Course #10 of 10 – Microsoft Professional Capstone: Big Data

Overview: Course #10 brings the Microsoft Professional Program in Big Data to a close. The capstone project is comprised of three different sections, each making up 33% of the overall grade even though the complexity and number of questions varies by section. The project is based on one years’ worth of weekly sales data for roughly 300 stores making the dataset a collection of over 15,000 files. As a result, we need to rely on the techniques learned in previous courses to set up blob storage, use Azure Storage Explorer to copy/store files, author U-SQL jobs to process files, set up the data warehouse using the supplied T-SQL script, create linked services to establish the dataset connections and pipelines, and finally load the data into the data warehouse to query for the final answers.

Time Spent / Level of Effort: When I read the first lab exercise instructions, I was a bit nervous about tackling this project. Over the previous nine courses, we’ve covered a LOT of material, much of which is challenging to keep straight. I had completed the 9th course about two weeks before the capstone opened, so I went back to the previous archived capstone and spent about 12 hours going through the old capstone to practice. As it turned out, this was time very well spent. In total I spent roughly 15 hours for this capstone project.

Course Highlight: The highlight of the capstone course was going from reading the original instructions with a hefty dose of confusion to ultimately completing the labs and projects to wrap up the class. It required me going back through several of the previous courses and reading the lab instructions and watching some videos for a second and third time.

Suggestions: My biggest suggestion for completing the capstone course is to view the previous capstone course to better understand the tasks. For example, I officially completed the capstone course that opened on January 1, 2018, but I spent several hours working on the previous capstone (October 2017). The time I spent on the October course was extremely helpful when the new course opened since I had already completed most of the exercises.

An additional suggestion would be to focus on obtaining the required passing score of 70% prior to the second part of lab 3, where all 15,000+ files need to be loaded and queried. I found this part of the course to be the most challenging, but since I had already scored above 70%, I wasn’t as concerned about getting these questions correct. However, if I’d been below the required 70% at this point, there would have been a lot more pressure to get these questions correct.

Finally, use the tips supplied in the lab exercise notes that inform you of the courses where the material was originally introduced. As mentioned above, much of the material overlaps a bit, so knowing which course to revisit saved a lot of time. Once I went back to the specific courses for review, I found the lab instructions for those courses to be extremely helpful.

Summary: The 10 courses that make up the Microsoft Professional Program in Big Data are an outstanding way to improve your knowledge about the concepts of processing Big Data. Prior to completing this MPP, several of these topics were vaguely familiar to me, but not well enough to teach/explain to others. After going through this program, I have a much better understanding and will continue to work with these technologies.

I hope this blog series has helped you on your journey to improve your Big Data skills. It certainly has helped mine!

Thanks,
Sam Lester (MSFT)