From Datacenters to Digital plants - Updated

Hi,

As you may have read in some of my previous posts, I spent the first 7 months of 2010 contributing to the SharePoint Online standard 2010 infrastructure. This infrastructure will provide SharePoint 2010 based services to tens of millions of people worldwide. In fact, it makes SharePoint fly in the Cloud, using Microsoft “boxed” / available products.

There are few keys to have in mind when thinking about this:

If you follow my blog, you will know that SharePoint Automation is a whole topic in itself, and one that I am passionate about.

SharePoint & Cloud computing

I have developed the following broad thinking for automation and optimization of the services:

  • A well-structured model is essential here. It should be based on the core concepts as implemented through the processes and can be used to develop the arguments for each task.
  • In a cloud(y) environment, failure to get requested operations implemented is often the rule => Expect failure, and plan on this base.
  • Here are few key concepts, features and characteristics of a large automated SharePoint system:
    • Reliability and resiliency - the service itself must be able to withstand whatever may happen in the layers below it.
    • Idempotency - the characteristics of operational tasks that can be replayed very often, without creating any discrepancy between the desired and actual states of the system.
    • Scale units - plan resources with coherent set of resources, called scale units.
    • Virtualization - from Hyper-V to “hypervisor on chip”; but that also could work with VMWare, Xen or VBox, if the necessary tasks are exposed with APIs.
    • Utility computing - use when needed, compute, then drop  or re-assign.
    • Autonomic systems - that manage themselves to serve their purpose.
    • Always optimize the resources, and on a continual basis
    • Elasticity - ability to absorb unexpected demand fluctuations
    • Agility - rapid provisioning of additional resources or services
    • System metering (monitoring and telemetry) - critical to know what happens.
    • Trust is necessary between components (sharing the data and security context) in distributed systems.
    • Continuous deployment - cut each service into smaller parts linked by interface contracts, so that each part can be continuously improved.

I spent a lot of time studying these concepts with 2 aspects in mind:

  • Theoretically : Thanks to various internal works and lectures, partners and competitors White Papers, books & keynotes
  • Practically : Learning how MSFT use them in its Datacenters for its Cloud services (ranging from Live Messenger/Hotmail, Bing, Office365 and Azure platform).

It led me realizing that: Cloud computing is the "Industrial age" of Datacenters!

Industrial Engineering in Cloud computing

My academic background is in Industrial Engineering, with a (French equivalent) Master of Science diploma in Industrial Engineering and certain advanced specializations.

Cloud computing terminology is very close to that of Industrial Engineering (sometimes closer to than to Computer science). Check for yourself terms you’ll find in the references and books at the end of this post, such as:

  • Resource management
  • Resiliency
  • Elasticity
  • Common environmental requirements
  • Power
  • Efficiency
  • Productivity
  • Cost optimization
  • Lifecycles
  • Workloads

ALL these words are from mechanics, physics or industrial engineering that have been developed for more than a century, when “factories” were invented.
When you see a “modern – Internet scale” Datacenter … what do you see? You see a huge building, looking like a car supplier factory more than a “data processing” facility, don’t you? So I started to analyze the core similarity between the “factories” that I used to work with while in automotive industry, and the Cloud computing concepts. This is how I came to the “Digital Plant” concept.

From Datacenters to Digital Plants ...

While I studied which Industrial approaches and tools could be transferred to design Digital Plants, I found common points between Computer and Industrial science:

  • Lifetime and workload management are the same issues and drivers
  • Discrete manufacturing (like cars) can bring a lot on the planning and resources/assets optimization
  • Process manufacturing (like glass or chemistry) can bring a lot on the "recipes" optimization to deliver a product (a service in the case of the Digital plant)
  • Utilities industries - of course - (like electricity or water) can bring a lot on the way they plan and optimize their resources generation, distribution and final consumption, as almost no stock is possible. And you cannot "store" either an online service.

Based on these roots, I searched for technics and mathematical tooling used in Industry that could be applied to Cloud computing. I found some of them very useful and interesting when transferred into a Digital plant. They include:

  • Multiple Products (and Services) Lifecycle Management in parallel:
    • Digital plant building and foundation has a "long" lifetime (around 10 years)
    • Digital plant productive components (such as servers, network hardware, etc.) has a "middle" lifetime (around 3 years)
    • Digital plant sold service (the Online service which is sold) has a "short" lifetime, as its software based, and evolving in a very competitive environment (can be few weeks)
    • More info here: https://en.wikipedia.org/wiki/Product_lifecycle_management
  • The Pareto distribution applies:
  • Resources optimization:
    • Workload demand fluctuates constantly
    • So the resources are always a little bit either too much or not enough available
    • It leads to take an analytical approach, in fact, a holistic approach for planning and optimization:
      • it's impossible to calculate according all the inputs to obtain the expected outputs with the assets used.
      • It means constant tradeoff in design and operations of the resources.

=> Resource optimization is the essence of Industrial Engineering.

 Here are few examples of what should be considered relevant models:

  • Look at the CPU usage for a given period of time of a virtual machine:

 If you transform this measure into time spent in "ranges" (like 0 to 10%, 10 to 19%, etc.), you'll obtain a CPU resource distribution close to this model

  • Another pattern you may often find is the "parts distribution" like these 2 ones, which are extracted from storage data analysis on a disk (or LUN): it represents the total size stored on this LUN per file extension:

 

Now let's zoom on the first 10%, and the trend is clearer: This is an exponential distribution 

And the funny part. This same LUN, now seen through the items counts per file extension: same pattern!

 

 

  • For the demand fluctuations/constant changes, the stock analysis models may be very useful. One to consider would be the Elliott wave

  • These models are the key for the resource optimizer to implement and continuously tune in a Digital plant. It adds the idea that the past helps to prepare the future.
  • An important key is to find the correlations between the relevant inputs and the efficient result/product/service to provide. Here again, Industrial engineering methodology, especially the Taguchi related ones should be of great help.
  • My last point here would be about the sampling rates. All this is based on large data collection. But this data collection, should not impact the service. It has to be tuned to get accurate correlations but without adding disturbance to the system itself. Another tradeoff endless scenario experienced in Industrial Engineering.

As I moved on to sort out all these ideas, I realized various things:

  • There's not that much research in this area - current approaches are still analytical and driven to achieve an exact situation (which is not particularly useful)
  • Many ideas are emerging from this. There's probably space to study, and write a book on them
  • All this is possible thanks to the huge improvements in network bandwidth and availability over the years. When you think back to BNC Ethernet networks 20 years ago, they were so much slower than my current personal ADSL connection to the Internet ....

To end this call for a story to write, I'd like to emphasis 2 things:

  1. Have a look at what an Industrial Plant is (https://en.wikipedia.org/wiki/Industrial_plant ) then watch and browse the references below on current Datacenters. You'll be hit by the similarities.
  2. I want to contribute to this cross discipline effort, where the best of 2 worlds (Computer & Industrial sciences) come together to create a new environment for innovation.

Contact me if you're interested too :-)

 Thanks reading this long post, which is broader than just SharePoint automation.

< Emmanuel />

References:

The evolution of Cloud Computing and the changing anatomy of a Microsoft data center: https://sharepointineducation.com/the-evolution-of-cloud-computing-and-the-changing-anatomy-of-a-microsoft-data-centre

Data Center knowledge: https://www.datacenterknowledge.com/

Microsoft Datacenters Blog: https://blogs.technet.com/b/msdatacenters/

Automation definition: https://en.wikipedia.org/wiki/Automation

Automation Outline: https://en.wikipedia.org/wiki/Outline_of_automation

Principles & Theory of Automation: https://www.britannica.com/EBchecked/topic/44912/automation/24841/Principles-and-theory-of-automation

The Cloud: Battle of the Tech Titans: https://www.businessweek.com/magazine/content/11_11/b4219052599182.htm

Books:

The Datacenter as a Computer, Luiz Barroso, Urs Hölzle (Google Inc.), Morgan & Claypool, https://www.morganclaypool.com/doi/pdf/10.2200/S00193ED1V01Y200905CAC006

Monitoring the Data Center, Virtual Environments, and the Cloud, Don Jones, https://nexus.realtimepublishers.com/dgmdv.php

The Big Switch, Nicholas Carr, W. W. Norton & Company, https://www.nicholasgcarr.com/bigswitch/