It’s 10pm. Do you know what your server is doing?

It’s really surprising how many people really don’t. They might think they do… but they really don’t… or they don’t take a full inventory of the possibilities. Yes, most of us are relatively good at making some attempt to push scheduled activities off to the evening to prevent impacting the user experience… but can you tell me what your server should be doing right NOW? I’d bet no… at least not without a significant amount of research.

Lets go through the possibilities. At any moment in time, your server could be…

…serving user requests during peak usage hours.
…running full crawls daily.
…running incremental crawls every 30 minutes.
…importing user profiles daily.
…doing usage analysis processing daily.
…running your STSADM backup scripts daily.

Or at least that’s what you think it could be doing… but have you tried to make sure that it’s not trying to do all of these things at the same time? Further, have you considered that there are probably other people in your company that are also doing their jobs on your server farm?

…your platforms team is deploying system updates once a week (resulting in unexpected server reboots)
…your DBA is doing SQL backups and running maintenance jobs daily (resulting in significant disk IO)
…your Antivirus team is running full scans of your server once a week (resulting in significant disk and processor IO)
…your Backup team is running server backups daily (resulting in significant disk and network IO)
…your Google appliance is indexing every 6 hours, or 4 times a day (forcing every page, file, and link in your entire farm to be loaded, compiled, and rendered).

So, overall, your system utilization looks something like this:

{998C0E3C-BF1B-4250-99A5-3EB80937F045}Notice how many activities are happening at the same time? Backups, profile imports, indexing… all happening at the same time. We (the SharePoint administrators) have done a good job pushing things outside of peak usage (except our incremental crawls)… but we’ve unintentionally set several things to happen at midnight, and we haven’t coordinated our activities with the things that other groups may be doing. Also, because of the order of events we’ve chosen information could either be out of date or could cause significant performance impacts.

A part of this is to understand what we really need to accomplish.

  • Do we really need search contents updated every 30 minutes? In most organizations, updating search once daily is perfectly reasonable.
  • Do we really need file system backups and SQL backups and SharePoint backups? Frequently, just backing up the content databases once daily is enough. You can certainly back up more, but you should understand what you’re trying to protect against by performing that backup. If your SLA calls for an acceptable loss of less than 1 day, adjust your backup schedule accordingly.
  • We probably don’t need incremental crawls to be scheduled at the same time as our full crawls.
  • We probably want our crawl to happen after our profile import so that the latest user information is available in search.
  • We don’t want any major activity happening during a patching window or when a reboot is likely (this is particularly important with crawling or you can risk corrupting your indexes!)
  • It’s possible we don’t need file system backups for SharePoint at all (though backing up select folders can be useful).
  • File system antivirus scans in SharePoint are okay, but only protect the server itself from infection. These scans will not scan SharePoint content because this content is not stored on the server, it is stored in the SharePoint databases.
  • Do we need both Google and SharePoint indexing content? (I’m not familiar with the Google appliance, but I believe SharePoint is likely more efficient at indexing itself than Google is, particularly when doing incremental updates.)

For demonstration purposes, lets assume we’ve discussed and coordinated our schedules with the various groups (platforms, SQL, AV) and are going to assume the following:

  • Patches are deployed on Saturdays at 1:00am.
  • Searches only need to be updated once daily.
  • Google will NOT index SharePoint, but we will use search federation so that searches in SharePoint will return results from the Google appliance.
  • File system backups will happen daily at 10pm, daily full and weekly incremental, and will only include the “12 hive”, InetPub, and IIS/SharePoint log folders.
  • SQL backups will include “all user databases” even though several databases are not restorable. This will ensure new content DBs are picked up automatically.
  • Our SLA states that search index recovery will be a full recrawl, so search index backup is not necessary.
  • SQL and File System backups can happen simultaneously because we’re dealing with separate IO paths and possible network consumption is acceptable.

Taking these things into consideration, we have the following schedule:

  1. 8pm-10pm – Antivirus Scanning (Sundays Only)
  2. 10pm-12am – File system Backups (Full on Saturday, Incremental Sun-Fri)
  3. 10pm-12am – SQL Database Backups (Full on Saturday, Logs only Sun-Fri)
  4. 12am-1am – System Patching
  5. 2am-3am – Profile Imports
  6. 3am-5am – Search Indexing (Full on Saturday, Incremental Sun-Fri)
  7. 5am-5:30am – Usage Analysis Processing
  8. 5:30-6am – Audit Policy Analysis (optional)

So, now our calendar looks like this:

{B9D2DA9F-FBF8-4BD5-B82E-39F821C2F17B}This is better because we’re not doing any work unnecessarily, and we have some gaps in time that allow some processes to unexpectedly go beyond their planned times. This also ensures that we’re backing up before any significant activities happen that could impact the server or content, such as antivirus scans (which risk corrupting a file if the file must be cleaned).

It’s also better because now I know EXACTLY what my server is doing at 10:00pm. :)

If you’re using virtualization, be sure to also look into what the other virtual machines in your environment are doing. Remember that you are impacted by whatever the other VMs on the same physical host as you are doing. If they’re using all of the disk capacity for example, your processes will take longer and your timelines may need to change. More virtual machines does NOT MEAN more computing resources!!!