This Cloud Computing course at SQL University explains the Distributed Computing paradigms used by major vendors, and covers information useful to the data professional for implementing proper architecture designs.
General computer programming data development terminology, industry experience in at least one of those disciplines
Instructor and Bio:
Buck Woody – Bio available at: http://buckwoody.com
In this class we’ll focus on:
· What cloud computing is
· Where it can be used
· How it applies to you and your organization
Each day there will be a lecture, along with homework for the next class session. There will be a comprehensive final exam – it’s contained primarily in your work environment!
Class 1 – Cloud Computing Defined
Welcome to the first class in SQL University on “Cloud” computing – although we’ll soon dispense with that term. First, why have a class in this discipline in a data-centric location like SQL University? Isn’t this class series dedicated to data?
Yes it is – and in fact, this is a perfect place to talk about distributed computing. At the end of the day, whether we’re talking about SQL Server or Windows Azure, it’s all about data – taking it in, processing it, storing it, computing against it, and returning it to the user. And of course we’ll spend some time on SQL Azure, Microsoft’s “RDBMS in the Cloud” along the way.
Let’s start out by defining a few terms. “Cloud” has become such a diluted term that it’s become synonymous with “Internet” or “Web”, but I’ll still use it throughout this series so that we stay consistent. To be more technically accurate, however, I’ll use the term “Distributed Computing” as a synonym. What that means is that the computing functions – at a high-level: input, processing, storage, output – are performed on multiple, intermittently connected systems. This brings up the concept of “Stateless” coding – something that might require a little explanation.
Take a single system running code as an example. Assume that on your laptop you run a game program. Your system marshals together the CPU, Disk, Network and most importantly, memory to do the work. As your game turns left, the CPU directs the graphics to show that result, and the memory stores where you are at that microsecond. Freezing that moment in time, you’re looking at the “state” of the program. It is declared to be a “state-ful” program, because if you turned off the computer, the memory would be refreshed and you would lose your place in the game. In fact, you would lose the fact that a game was running at all. You’d have to start over.
Contrast that with a “state-less” program. You might even have that game again – but in this case, the code that runs might exist on more than one computer. And instead of saving where you are in the game just in memory, the “state” could be stored in a file that all of the computers can access. So now if one goes down or is too busy to respond, another identical copy of the program itself can reference the file and continue working.
Of course, for this to work, there are two further areas to consider. One is that you have to handle the fact that the user needs to be re-directed to another computer. This can be handled by placing a load-balancing redirector to answer the user, and it decides where the packet of data goes. In Windows Azure, this is handled for you. Secondly, you need to deal with the latency introduced by storing state on a file instead of in memory. So as you can see, not everything goes to the cloud. But there are workarounds, mostly involving thinking in queues or messages. The combination of these and other computing strategies is a process called “Stateless Computing”, and it involves hardware and code. I’ll include references to this in the “Homework and Reading Assignments” section below.
The next part of the definition of cloud computing is the way it is used. While many vendors (and even open-source projects) offer “cloud” computing, it’s important to understand exactly what they are talking about. The industry has created three broad categories that we agree to use for discussing a distributed computing environment.
Infrastructure as a Service (Iaas) – Abstracts the hardware layer, most often by hosting a Virtual Machine (VM) or drive storage.
This paradigm is useful for “canned” or pre-packaged applications that require a “setup.exe” process. Scale, High-availability and operating system licensing and maintenance are your responsibility. The vendor handles the hardware for you. Microsoft offers Windows Server, Hyper-V, and System Center for our IaaS offering, your responsibility is obtaining a hardware hosting service for those software platforms.
Software as a Service (SaaS) – Abstracts away everything. It’s a software package you access without installing anything.
This paradigm is useful when there is a specific need that the software provides, and you do not have a high customization need. Microsoft offers Office 365, Exchange and other services as a SaaS paradigm.
Platform as a Service (PaaS) – Abstracts away the hardware, Operating System, Scale, and to some degree High Availability and Disaster Recovery.
In this paradigm you write or re-write portions of your code to simply run somewhere else. Conceptually your code runs on a single, massive, adaptable machine, although there is of course hardware, software and management happening underneath. Microsoft Azure is the offering we use to allow you to port or write .NET and other languages (such as Java and C++) and then simply deploy that code to Windows Azure.
Using a PaaS has some unique benefits – such as programmatic scale (you can seamlessly add more power or storage based on counters you watch or create), resiliency, global reach and so on. Since you pay for what you use, the cost structure is completely different – and can even be shifted to the business unit that asks for the increase in power or speed. Code does not have to change based on scale. Microsoft has resiliency built-in to the platform, so no management of the operating system, patching or licensing is required.
Another benefit is that the compute, storage and even a service-bus offering can be used together or even separately. This allows a great deal of flexibility, and even has the potential to “federate” your on-premise systems into the cloud, or even from company to company, all without having to set up a VPN or compromising your security structure.
The end result is that a PaaS solution, while requiring some re-architecture, provides benefits such as scale and adaptive billing that go beyond immediate needs, depending on your ability to change the code in your solution. It should also be noted that a PaaS solution is one of the few that allows you to migrate to the cloud in an iterative fashion – parts of the application can be migrated as time, resources and other factors dictate.
SQL Azure is a PaaS solution as well. It’s simply a SQL Server system running in the same datacenters as Windows Azure. Under the covers it maintains the hardware, Operating System, patches and so on, and runs an Instance of SQL Server. You have a longer connection string, and it uses SQL Server authentication only, but other than that you connect to it and use like you would an on-premise database server. There are differences – some of the features found in SQL Server aren’t available in SQL Azure, the sizes of the databases are more limited, and you start at the database level – not the instance level. In the reading assignments below I’ll show you more information about that.
In the next class, we’ll cover the use-cases for Distributed Computing. It’s not for all situations, but does fit others extremely well.
For a general overview of cloud computing architectures using Windows Azure, check this link: http://blogs.msdn.com/b/buckwoody/archive/2010/12/21/windows-azure-learning-plan-architecture.aspx
For more information on SQL Azure, check this link: http://blogs.msdn.com/b/buckwoody/archive/2010/12/13/windows-azure-learning-plan-sql-azure.aspx