Notes from the Field: Upgrading SharePoint

Over the years, much has been written about upgrading SharePoint.  TechNet has some very good resources (links below) that provide process overviews, checklists, etc.  I encourage you to review this content prior to beginning an upgrade of SharePoint.  So what is the real point of this post?  While the referenced material and many other resources provide necessary information, the point of this post is to cover some of the "gotchas" and details that are often overlooked and cause headaches during the migration.  This post will focus on migrations using out-of-the-box tools and techniques.  SharePoint 2010 to SharePoint 2013 upgrades will be the primary focus, as there are some significant performance snags that can occur during the upgrade process.  However, the process and guidance is also applicable to upgrades to SharePoint 2016.  If you are upgrading from SharePoint 2010 to 2016, you need to go through SharePoint 2013.

Content cleanup/preparation and thorough testing are ABSOLUTELY CRITICAL to a successful migration.  These tasks can be difficult, especially content cleanup activities that require your end users to take actions such as redesigning solutions and reviewing content.  However, the effort put into doing proper cleanup and testing will pay huge dividends on migration day.  With that in mind, I want to call out key activities in these areas.

Content Cleanup/Preparation:

Many of the performance hindrances in a migration are the same issues that cause performance issues during "normal" use of a SharePoint farm.  Pay attention to the boundaries outlined here.  There are several "hidden" areas in SharePoint that can accumulate excessive amounts of data.  With the upgrade from SharePoint 2010 to 2013, there are some significant database schema updates that require data to be moved between tables.  In order to perform these changes, the upgrade process will copy the data out of the tables, then add it back after the schema changes occur.  This can cause extremely long periods in which the process from the SharePoint admin side looks like it is stuck (especially at 4.85%).  Reducing "excessive" data is a critical cleanup step.  List below are key areas to target for cleanup.  For customers that have Microsoft Premier Support contracts, an escalation engineer can provide a SharePoint Performance Diagnostics tool that will analyze and report on performance-inhibiting conditions within a farm.  The escalation engineer will also be able to assist with analysis of the output and providing guidance on remediation of the issues.

  • Site collection audit data:   By default, site collection audit data is enabled.  When a user enables it, the trimming option default to "no trimming".  This can cause massive growth in content database size and incur hours of processing during a migration.  Consider implementing a mandatory "max retention" value, which can be done via a simple PowerShell script.  SharePoint will trim audit data based on the trimming values configured in each site collection, but there is nothing with Central Admin to enforce a maximum retention policy.  You can set up a scheduled task to enforce this requirement going forward.  I have seen site collections with over 100 million rows of audit data that was adding no business value.  Upgrading large amounts of audit data can incur hours of processing per content database.  If you need to clean up the audit data outside of the native timer job, contact Microsoft Premier Support and request guidance/approval for alternative cleanup methods.
  • Workflow History / Task History:   Workflow and task history data should be reviewed and purged, retaining only a minimal amount of each.  The support boundaries document referenced above sets a support limit at 5,000 items per history list, but there are cases in which performance impacts are seen well before that mark.  Users often configure all workflows in a web to use the same history list.  While this might work for a solution that has a low number of workflow instances, it is not a recommended practice for high volume workflow solutions.
  • Remove "stale" content:   Remove content that is no longer needed.  Think of this upgrade as moving to a new house.  Your objective should be to only move those items that you want to keep.  This can be a large undertaking, especially if the task cannot be dispersed across a large number of end users (each tasked with cleaning up a small portion).  Delete webs/sites that are no longer in use.  Remove documents, lists, etc. that are no longer required.  Clear both levels of the recycle bin, if possible.  If a large amount of content is removed, consider shrinking the content database.
  • Refactor solutions with wide lists:   It is extremely easy for end users to create lists that include too many columns.  Depending on the column type, exceeding a set number of columns per type will cause row wrapping within the content database.  This has exponentially negative impacts on site performance and will cause upgrade slowness.
  • Reduce permission scopes:   Unique permissions have a negative impact on performance.  The worst offender is unique item-level permissions.  The number of security principals per unique scope should be kept under 1,000.  Large lists that have unique permissions can easily exceed this boundary.  Permissions should be set at the highest level possible (web vs list, list vs item, etc).  Wherever possible, avoid item-level permissions.

Automate Your Migration Process:

There are content databases that will take many hours to upgrade, especially if cleanup tasks are not thoroughly performed.  In several recent migrations, my customer had numerous databases that took over 12 hours each to upgrade.  If you are in an environment that enforces limits on server sessions, babysitting upgrade sessions can be extremely tedious.  A few hours spent creating some PowerShell scripts that can be run as scheduled tasks will pay exponential dividends in terms of manpower savings, as well as allow consistent, repeatable processing.

Test, Test, Test...  Then Test Some More:

At a minimum, perform one end-to-end migration dry run.  This MUST include transferring your data, testing all content databases in the new farm (Test-SPContentDatabase) and mounting all content databases in the new farm.  Without at least one dry run, you will have no insight into which databases are going to take significant amounts of time to upgrade.  I always recommend multiple full dry runs, as it enables the establishment of average run times and ensures that there are no surprises on the production migration run.  As part of testing, identify those databases that are taking excessive amounts of time to upgrade.  I have heard of instances where a single content database has taken upwards of 48 hours to upgrade and seen multiple instances of 12 hours or more per database.  Part of the testing needs to include running Test-SPContentDatabase on the content after it reaches the new farm.  This will enable you to identify sites that are referencing components that are not present in the new farm (whether intentionally missing or simply an oversight).  Additionally, it will allow you to identify migration blockers such as wide lists.

Have Patience...  Plenty of it:

Unless you are absolutely positive that your migration process is stuck, have patience and let it run.  The tendency when an admin feels a process is stuck is to terminate the process and restart it.  However, doing that during a content database upgrade can cause corruption in the content database.  There are several points in the upgrade process between 2010 and 2013 in which the upgrade might appear stuck (such as at 4.85% and 27%).  Before terminating any upgrade process, have a database administrator monitor the process to determine if any action is occurring.  In scenarios where there is extreme amounts, such as the audit data scenario described above, activity will continue to occur on the database side, but might be extremely slow (a few KB at a time).  If you are seeing any movement on the database side, have patience and let it run.  When in doubt, have patience and let it run.

Other Things to Consider:

SharePoint 2013 provides the ability to create an evaluation site, as well as the ability for sites to remain in a SharePoint 2010 user interface mode.  While these features have merit, there are some drawbacks that need to be considered.

  • Evaluation sites:   By default (simply attaching a 2010 content database to a 2013 farm), sites will remain in 2010 UI mode and users will have the option to request an evaluation site via the UI.  Requesting an evaluation site will trigger a queue entry that gets picked up by a timer job that runs daily.  Creation of evaluation sites can cause blocking within the database, especially for very large site collections.  Additionally, the use of evaluation sites could double (or more) your storage footprint.  You can add a measure of control by removing the ability for users to self-request an evaluation site and replacing it by requiring that requests be submitted to your farm administrators, who can manage the creation of the evaluation sites.
  • SharePoint 2010 UI mode:   Allowing sites to remain in 2010 UI (legacy) mode allows you to upgrade the farm without requiring the users to adopt immediately to the 2013 UI.  However, sites that remain in 2010 mode will not see new features and could potentially have one-off issues with functionality (to which the resolution is "upgrade to 2013 UI").  Additionally, sites that remain in 2010 mode will block upgrades to 2016.  My personal opinion is that it is better to allow adequate time prior to the migration to enable users to prepare for the upgrade (including site branding updates, etc.), rather than rushing the upgrade and allowing sites to remain in the legacy UI mode.  If you must allow the legacy UI mode in the new farm, I recommend that you scope the period of time to a short period (30 days or so), after which farm admins force the upgrade to the 2013 UI.