This Cloud Computing course at SQL University explains the Distributed Computing paradigms used by major vendors, and covers information useful to the data professional for implementing proper architecture designs.
General computer programming data development terminology, industry experience in at least one of those disciplines
Instructor and Bio:
Buck Woody – Bio available at: http://buckwoody.com
In this class we’ll focus on:
· What cloud computing is
· Where it can be used
· How it applies to you and your organization
Each day there will be a lecture, along with homework for the next class session. There will be a comprehensive final exam – it’s contained primarily in your work environment!
Class 2 – Cloud Computing – Class Two, Use-Cases
Welcome to the second class in SQL University on “Cloud” computing. If you haven’t had a chance to take a look at class one yet, you may want to switch to that post and learn about the definition of cloud computing, since I’ll be using those terms today.
When many technical professionals hear about distributed computing, they begin listing the reasons it won’t work in their environment. This comes from comparing a certain feature or process that they currently use, and don’t find a corollary to from a particular cloud offering. The misunderstanding comes from a perception that a cloud vendor expects an organization to switch whole-cloth from one technology to another. But this is not the case. To help with this issue, the first place to start is with where a new technology fits. For instance, when you introduce an RDBMS into your environment, no one expects that you will no longer store data in anything other than an RDBMS database. It’s simply another option that is used only where it fits.
A distributed computing environment is no different. It is not intended (nor should be intended) to replace an entire on-premise environment. Latency, private data, and many other issues preclude this from being a good choice. There are, however, instances where a distributed computing system works well. In general, these use-cases involve the following broad advantages:
· Scalability (up and back down)
· Defined Billing
· Abstraction from the platform and below
Note: I’m describing only a PaaS solution here (see the previous class for more information on this term) and not SaaS or IaaS. Those have different use cases.
Let’s take a quick look at each of these, and then define the particular use-cases that seem to make the most sense in a PaaS distributed Computing Environment.
Scalability (up and back down)
In the case of a PaaS solution (such as Microsoft’s Windows Azure) the system adds or subtracts computing power on-demand. In the case of Windows Azure, you can programmatically code the system to watch counters of your choice (computing power, logons, sessions, etc.) and add more computing power or storage. Your code must use a “Stateless Computing Model” for this to scale seamlessly, but this paradigm is not vendor-specific or even specific to the cloud, so the code is portable even back to your on-premise systems if you change your model away from a cloud provider at a later time. Conversely, you could code this way on-site and either move to the cloud or even expand the on-premise footprint of your application into the cloud as needed – this is called “bursting”.
An important consideration is the ability to scale back down after you scale up. Some vendors require that you retain the resources you request for a certain period of time, so make sure you vet those conditions with your particular cloud vendor source.
A PaaS distributed computing paradigm is “pay as you go”, meaning you will be charged a mixture of compute, bandwidth and storage units. There are methods of estimating these costs (See the reading below), and ensure that the costs meet the objectives. From a use-case standpoint, defined billing is ideal because the cost burden can be directly applied to the business unit that wants the application (meaning IT does not have to carry that budget) or those situations where user activity generates revenue, such as a sales website. In other words, you pay more for when you’re using it, but you’re (theoretically at least) making more money because it is being used.
Abstraction from the platform and below
This use-case actually defines PaaS. In other distributed computing paradigms such as IaaS, you have to install and patch an operating system and platform, create and maintain your own scale-out architectures, and determine your own High-Availability strategies. In a SaaS environment you simply use the service provided by the cloud vendor. In PaaS, however, you write applications and store data, and do not control the operating system, platform, runtimes and so on. You simply focus on the application code and the data.
To put this more simply, if you type “setup.exe” or “./setup” then you’re looking for IaaS – if you open Visual Studio or Eclipse, you’re looking for PaaS – if you log on, do some work and then log off, you’re looking for SaaS.
I’ve described these use-cases in more depth in the links below. A general list of where distributed computing works well are the following:
Windows Azure Use-Cases
Elastic Scale – Bursting workloads up and down in use patterns
Agility – The ability to quickly develop and deploy an application
New Development- Code option for new applications
Web-Centric Applications – Applications that are developed for a web paradigm
Hybrid Applications and Data – Applications and data that need to be both on-premise and in a distributed environment
High-Performance Computing – Applications that require multiple processing nodes, such as scientific, research or financial data (also known as Technical Computing)
Infrastructure Limits – Inability or unwillingness to add more physical computers to the environment
Fast Acquisitions – The ability to quickly migrate a newly acquired business to the current computing environment
The key is to examine what you do today in your computing environment, and decide which of your applications fit any of these patterns. For those that match the pattern, begin an Architectural Design Session (ADS) that defines how that application would be architected in a distributed computing environment, and whether the costs and benefits merit a closer investigation. Most of the cloud vendors have the ability to help you perform an ADS, either through a consultation or whitepapers and the like.
One of the core tenants for developing in a distributed computing environment is to use Stateless Programming. In effect, using an HTTP server involves stateless programming, but people have added session state, ActiveX programming and the like and gotten away from it. But for scale, it’s critical that you code in this manner. This is a very old reference, but a good starting place to discuss this topic: http://www.adiscon.com/iis/isapi005.htm