On Gardens and Governance (2nd post)

This is the second in a series of posts that plays around with the idea that looking after SharePoint in your organization is a little like looking after a garden. Without some form of control over it, it will quickly deteriorate and end up as something that is neither a pleasure to use nor something from which you derive benefit.

Without being expert gardeners or professional horticulturalists, we've simply stated that maintaining a garden consists of three core elements:

  • Maintaining what's already there
  • Planning for the introduction of new elements
  • Removing that which is no longer needed or wanted

The first post that provides the introduction can be found here: On Gardens and Governance (Post 1)

The second post that discusses the need to maintain the existing state can be found here: On Gardens and Governance (Post 2)

The third post that discusses the need to cover the mechanisms for introducing new elements: On Gardens and Governance (Post 3)

The last post that discusses the removal of content, sites, and people from the SharePoint farm(s): On Gardens and Governance (Post 4) 

With this article we're going to be looking at the first element: maintaining what's already there. So, this will be the more tedious elements: cutting the grass, edging the lawn, deadheading the flowers, weeding the beds, sweeping the patio, and maintaining the chemical balance in the pond.

One part of the Governance Plan is to dictate what will be maintained and how it will be maintained. So, the governance of our garden dictates that the patio is swept once a day, the grass is cut once a week between April and September, and the pond is balanced once every three months. It's interesting to note that there's a balance and tradeoff to be had. The more tasks and the higher their frequency, the more your garden will look like something from a magazine but it will incur a higher cost of maintenance (either in your time and effort, if you have a small garden that you tend yourself, or costs to your gardening team, if you manage a large estate).

And so it is with SharePoint, except it's more insidious than a garden. You can look at a garden that hasn't had its lawn cut for a week or two and it'll let you know; it'll look messy and unkempt and it'll scream at you that something needs to be done. This is true, of course, unless you are an adolescent charged with cutting the lawn once a week. When I was fifteen, the lawn looked just fine and I couldn't see what the fuss was about! However, management of the people whom you employ to implement your governance is outside the scope of this article (as is the parenting of adolescents).

Why is SharePoint more insidious than a garden? I call it the "silence before the explosion". SharePoint won't scream and shout at you that something needs to be done unless you are paying close attention. If you simply install it and use it but don't maintain it, everything will seem to work okay and you won't notice the underlying problems because you won't be looking at the vital signs. You'll come to the corporate intranet and you'll be able to use it. You'll come to your team sites and your documents will still be there. And so it will go one until one day it blows up, stops working and you suddenly have the senior executives screaming at you. What calm and silence there was before the explosion.

So, how can you prevent the execs for screaming at you? Your governance needs to dictate an operations plan and then ensure that it is being executed. The first step in this is that you need to make a list of all the tasks that need to be done to maintain your SharePoint environment and then decide on their frequency. As you do so, you should be aware of the cost/effort balance. Now, I'm not going to run through a list of all the daily, weekly, and monthly operations tasks. You can find those SharePoint 2010 Operations.

When your make your list, prune out all the elements that you don't use in your installation. Your governance plan should be unique to yourself, your business and your organization. All the major tasks headings should almost certainly be present: Backup and Recovery, Database Management, Security and Permissions, etc. However, as you delve into the detail of each, you will find that you need to customize and tailor these to you particular instance of SharePoint. This is because not all organizations use the same backup utility, not all organizations use the same suite of service applications, etc. Also, please don't forget to augment your list with custom elements that you have installed and deployed on top of SharePoint. These will also need ongoing maintenance.

I'm afraid this isn't an easy task. There is going to be some work and effort that you'll need to undertake. However, the effort is going to be worth it. It really is. It's the difference of knowing and understanding your SharePoint deployment versus being surprised when something goes wrong and you suddenly have thousands and thousands of uncontrolled sites or the database seizing because the disks are full or the web servers responding so slowly because memory is exhausted. It's the difference between having something that is a pleasure to use and helps to drive your business forward versus something that the users hate because it take so long to upload a document or render a web page, etc.

Keeping track of the vital signs

Unfortunately, having the list of the schedule of operational tasks is only the first step in a well governed instance of SharePoint. We need to make sure that the adolescent actually cuts the grass each week (I usually did) and we also need to monitor the vital signs.

For this section, I'm going to need to switch my analogy for a bit and look at the human body instead of a garden. The medical world has the concept of "vital signs". These are body temperature, pulse rate, rate of breathing, and blood pressure. Apparently, blood pressure is not considered a vital sign, but is often measured along with the vital signs. To me this makes no sense. If blood pressure is 0 over 0, I think that's a pretty useful indicator that something is not too good with the patient.

So, what are the vital signs of a SharePoint installation? For me, and you may need to add to these, it's to examine the capacity and performance of the system, to ensure that all operational tasks have been undertaken, and to view a summary of the health reports. With these three things addressed, the Governance Team can be sure that SharePoint will be running sweetly and that there'll be no surprises and no executives will need to start shouting at anybody.

The Governance Team needs to meet on a scheduled basis. Something like once a month should be fine. In the first instance, when you have just deployed SharePoint or just deployed a new application, you might like to increase the frequency to once a week, perhaps. The purpose of the meeting is to address all elements of the maintain/plan/remove governance dictates that have been defined. The "maintain" part of the agenda is implemented by monitoring the vital signs of your SharePoint installation.

Capacity and Performance

So how do we examine the capacity of the system (and what do I mean by that)? Well, I think it's quite simple, really. The three main elements of any computer systems are processor, memory and storage. And these are one of the main indicators of whether or not a system is healthy and has the capability and capacity to execute its role. For each of the computers running SharePoint, you need to determine the thresholds for each of these readings according to their role and usage (web front end server, application server, and database server). Then you need to use a utility (perfmon or System Centre?) to take these readings on a scheduled basis. For a live system, something like once every 15 minutes or so should prove more than sufficient.  As you get to know your system, you can increase the frequency so you don't put too much load on the system simply by gathering metrics.  There's no universal answer for what this period should be, I'm afraid.

For me, as the Governance Team is not comprised solely of technicians but also includes business representatives, these three readings are sufficient to give an understanding of the high-level health of the SharePoint deployment. For the technicians, there's more that we can monitor. But for the business folks, this level is understandable and represents a suitable level of abstraction that, I think, will make sense to them.

As an aside, the technicians can, if you want to, drop down a few levels and ask whether the underlying IIS and ASP.NET infrastructure is performing efficiently. And I think that it is necessary to do this. But, we don't need to worry the whole Governance Team with this level of minutiae.

In order to determine whether some of the subsystems are performing well, you may want to undertake some tuning exercises on the various parts of the SharePoint infrastructure. For example, a SQL Server Tuning exercise can help ensure that all the necessary operational parameters are configured such that the database will be performing in an optimal manner. Additionally, you may perform a tuning exercise on the web front end servers for a similar reason. These, possibly regular, tuning exercises can be presented to the Governance Team but you need to keep the level right. It's usually sufficient to say, "Here's the throughput we used to have. After the tuning exercise, this is the throughput that we now have." And for the "throughput" readings, take something that makes sense to the business folks: time to upload a document, or time to render an intranet page, etc. I know there are more telling indicators such as number of users before average response time dropped to 80% of such and such but the eyes of the business users will glaze over with boredom if you try to convince them you've done an exhaustive job. Simply doing the job well and having the figures to drill down into if anyone wants to should prove sufficient.

Now, what do we gain by having a monthly recording of the readings of the processor, memory and available storage for each server? Well, we've started to the record the information necessary for us to perform Capacity Management. The Governance Team can look at the storage figures of the database server slowly increasing as more and more people start to use the collaborative environment, and we can start to understand that in, say, six months' time, we'll need to ensure there's additional storage made available. We can start to see the increased processing load on the application servers being used to render the Excel spreadsheets as more and more people use the business intelligence capabilities and we can see that we'll need to introduce another server into the farm in a few months' time.

I think this is a good position to be in.

Operational Tasks

This part of the Governance Meeting should be fairly quick. It's nothing more than a report from the operations team that they have indeed performed the tasks and duties that they have been instructed to. Has the database been backed up? Have the log files been examined for errors? Were any errors identified? If so, has corrective action been defined and when will it be performed?

In fact, this part of the Governance Meeting should be run by way of exception. The first question that is asked is, "Has everything that needs to be done been done?" If it hasn't, the team can spend some time to understand what the underlying problem is and agree on a plan of action and balance it against other competing priorities.

This section is needed because whereas we can see in an instant whether the grass has been cut, there is no obvious way from looking at a given team site to see whether it has been backed up in the last week. So, we need to get an explicit confirmation that tasks have been undertaken according to their dictated schedule. That is, we need proof that the governance is actually being implemented. A Governance Plan that sits on the shelf is as useful as a chocolate fireguard.

Health Analyzer Summaries

Allow me to quote from SharePoint 2010 Monitoring Overview:

"SharePoint Server 2010 includes a new, integrated health analysis tool that is named SharePoint Health Analyzer that enables you to check for potential configuration, performance, and usage problems. SharePoint Health Analyzer runs predefined health rules against servers in the farm. A health rule runs a test and returns a status that tells you the outcome of the test. When any rule fails, the status is written to the Health Reports list in SharePoint Server 2010 and to the Windows Event log. The SharePoint Health Analyzer also creates an alert in the Health Analyzer Reports list on the Review problems and solutions page in Central Administration. You can click an alert to view more information about the problem and see steps to resolve the problem. You can also open the rule that raised the alert and change its settings."

Isn't that neat?

If you've stuck with me this far, you have to agree, that there really is no reason why there should ever be a silent explosion. Whereas the review of Operational Tasks asks the question, "Has everything been done that ought to be done?" Examination of the Health Analyzer summaries enables us to answer the question, "Is there anything we need to know about?"

Again, this section of the meeting should be done by exception. "Is there anything we need to know about?" If so, what's the problem and how are we going to resolve it? This section of the meeting is designed to stop the silence before the explosion. It forces the Governance Team to look at underlying problems that might be starting to express themselves. It unstops their ears and opens their eyes such that they can hear the rhythm of the heartbeat and see when something is starting to go wrong.

Conclusion

This article has been about controlling the day-day process of operating a SharePoint environment. The focus has been on stopping the silence before the explosion. I think we've found that there are tools and utilities and guides already provided for you to help ensure your SharePoint deployment operates in an efficient manner. Apart from the hard work, I think you'll agree that there's no real reason why your SharePoint deployment can't be a well-run environment.

Next time, we'll be turning our attention to the next topic in this series: Planning for the introduction of new elements. What we'll find that some of the information gathered from the daily operations becomes crucial and takes center stage.