Upgrading AlwaysOn and Mirrored Environments

 

Starting with Team Foundation Server 2012 Update 2, TFS users will be able to upgrade deployments that use SQL AlwaysOn High-Availability Groups or Database Mirroring.  In the past, and for a number of good reasons, the upgrade process would always try to place the databases into the SIMPLE recovery model, which is not compatible with database farms such as AlwaysOn and mirroring groups.  Therefore, users had to remove all databases from these features before attempting to upgrade, or else the upgrade would fail.  TFS will now detect that a deployment’s configuration database is involved in SQL AlwaysOn or database mirroring and will present the user with the option to upgrade their deployment in the FULL recovery model, thereby bypassing the requirement to disassemble their farm before upgrading.  In addition to being able to upgrade this way from the upgrade wizard, users can use this feature when performing an attach of a Team Project Collection or an unattended upgrade, as these commands essentially perform the same database upgrade operations as a standard upgrade.  However, upgrading in the FULL recovery model, while convenient, requires more log and disk space than upgrading in the SIMPLE recovery model. Not having sufficient space to upgrade in the FULL recovery model will cause the upgrade to fail.   Your TFS database administrator should review the following documentation on the added risks of upgrading in the FULL recovery model carefully before you proceed to use this feature.

 

What is the difference between the SIMPLE and the FULL recovery model?

The SQL Server recovery model controls the way SQL transactions are logged and whether these transactions can (or must) be backed up.  Databases are typically put into either the SIMPLE or the FULL recovery model.  The SIMPLE recovery model automatically reclaims log space after transactions are completed to minimize log space requirements; on the other hand, the FULL recovery model keeps all transactions in the log until they can be backed up to another location.  This means that databases using the FULL recovery model can be recovered to any point in time, while databases using the SIMPLE recovery model can only be restored to the latest full backup.

 

How does this apply to Team Foundation Server upgrades?

By default, the Team Foundation Server upgrade process switches databases into the SIMPLE recovery model to service them.  This allows for a faster upgrade and minimizes the chance that the heavy transaction load of an upgrade will cause the system to run out of log space, which would result in the upgrade process failing.

However, there are a number of options that an advanced operations group may use in SQL Server, outside the scope of TFS, to provide business continuity, disaster recovery, or high availability to their deployment.  These options include Log Shipping[1], Database Mirroring, and AlwaysOn Availability Groups, and all are dependent on the databases in question being put into FULL recovery model and staying in FULL recovery model.  Up until now, attempting to upgrade an entire server farm at once would fail because the TFS servicing steps were unable to put the databases into the SIMPLE recovery model.  Thus, the only way for users to upgrade AlwaysOn or mirrored environments was to remove their databases from these groups and upgrade them individually.  Now that Team Foundation Server 2012 officially supports SQL AlwaysOn High-Availability Groups, the upgrade wizard will intelligently detect when a database is involved in SQL AlwaysOn or Database Mirroring and allow the deployment to remain in the FULL recovery model during upgrade.

If you are using AlwaysOn or Database Mirroring with TFS, you should only have one database group and your configuration database should always be part that group.  When you run the upgrade wizard and point it to your configuration database, the wizard will detect its involvement in AlwaysOn or mirroring and display the additional text outlined in red on the page.  You will not be able to continue with upgrade until you confirm that you would like to stay in the FULL recovery model during the process.  If you don’t want to proceed with upgrade in the FULL recovery model, you still have the option of removing your databases from their farm and upgrading them individually in the SIMPLE recovery model.  You will need to close the upgrade wizard and restart it once you have made that change to all affected databases. Don't forget -- if you choose to revert to the SIMPLE recovery model during upgrade, you will need to re-enable AlwaysOn or mirroring for each database after your upgrade.  If you are using SQL AlwaysOn, you will also need to remember to add your databases back into their AlwaysOn availability group.


  

 

So tell me, what are the risks of not putting the databases into the SIMPLE recovery model prior to upgrading?

Log and disk space

The primary risk will be amount of log space required by the database to complete the update.  Upgrade will make changes to the SQL databases in a number of serial transactions.  These transactions may be quite large.  When the database is using the SIMPLE recovery model these transactions can be purged from the logs by SQL as soon as they are committed.  Under FULL recovery these committed transactions will need to be backed off to disk before they can be purged.  Additionally, the transactions will need to be written to the Failover Partner/Secondary database(s)’ log files – a process known as hardening – before they can be removed from disk.  Where the log space requirements with SIMPLE recovery model may have been the single largest transaction during the update, the new log space requirement with FULL will be the single largest transaction plus some additional quantity of space for committed transactions that have not yet been written to disk and hardened at the secondary.   There is also a completely new requirement that the server’s disk have sufficient space available to store these transactions until they are written over to the secondary log files.  Running out of space is the primary risk as the update won’t be able to complete without adequate log space.  The only way to know what “adequate” is for certain will be to monitor log and disk space usage during a test upgrade. We have provided some guidelines in the “Metrics” section around the recommended amount of log space based on our own internal testing of the feature, but every update and every TFS deployment will be different.

Performance

Less of a risk, but an equally important factor will be an increase in the length of time required to do the update.  Upgrading in the FULL recovery model requires the transaction log to be backed off to the “Backup Log,” which will impact both disk and transaction log performance.  Both Database Mirroring and AlwaysOn Availability Groups will require the log records to be written to the mirroring partner/secondary database(s) before the committed transactions can be removed from the log.  This will add additional network load as the transactions are copied over.  If your server is set to use the Synchronous transmission option the commit of the transaction on the primary will be delayed until the secondary can also commit, which may introduce additional latency.  For more details on Synchronous versus Asynchronous transmission, see “Advanced Options: Commit Latency Mitigation.”

As the ATs should be quiescent during the upgrade, there should be no load considerations for the primary databases.  If the AlwaysOn High-Availability Group’s secondary databases are set to being used for Read Only requests, that load could potentially slow down the transfer of the log records and their application if read-only users are running queries in the secondary databases.  Thus, it is advisable to disable user access to these databases during the upgrade.  One should be able to change them from read-only to non-readable in the AlwaysOn Availability Group dashboard and then back again after the upgrades.    

 

Metrics

As stated previously, the only way to truly know how much extra log and disk space will be adequate for performing upgrades in the FULL recovery model is to monitor log and disk space usage during a test upgrade.  This is because the amount of database changes contained in each TFS upgrade will be different depending on which release you are upgrading to and which release you upgrade from.  The topology of one’s TFS databases and the amount and density of the data in these databases will also be a factor. That being said, here is data from a set of sample upgrades that we performed using customer data.  We will continue to update this data periodically as we perform more testing on different-sized databases.  You may want to use this data to help you decide whether you want to risk upgrading in the FULL recovery model without running a test upgrade first.

TFS Instances (sizes in MB), upgrading from TFS 2010 SP1 to TFS 2012 Update 2:

Database Database size before upgrade Log space used in SIMPLE Log space used in FULL
Configuration  8157.13  31.36  342.23
Collection 1  124.56  62.81  358.22
Collection 2  74.44  7.13  121.29
Collection 3   4707.50  695.50  2234.19
Collection 4  85.06  46.41  372.64
Collection 5  56.75  17.18  347.29
Collection 6  93.63  27.46  389.36
Collection 7  124.69  27.84  403.97
Collection 8  79.13  37.51  350.14
Collection 9  140.44  54.68  399.38
Collection 10  83.81  32.85  153.02
Collection 11  180.00  39.42  451.45
Collection 12  95.81  33.31  366.82
Collection 13  859.13  50.14  578.27
SSRS Warehouse  1439.69  34.79  36.08

 

Database Database size before upgrade Log space used in SIMPLE Log space used in FULL
Configuration 2123.75 722.02 2814.43
Collection 1 323.88 308.71 309.09
Collection 2 141221.63 5856.43 36525.21
Collection 3 4865.35 228.74 1202.33
Collection 4 45879.94 4337.93 38279.04
Collection 5 12642.38 3059.35 13616.81
Collection 6 91.19 36.11 348.6
Collection 7 74.25 34.07 45.57
Collection 8 138.19 40.44 340.21
Collection 9 3021.13 1536.38 5336.12
SSRS Warehouse 17738.5 65.13 64.79

 

Database Database size before upgrade Log space used in SIMPLE Log space used in FULL
Configuration 1131.38  377.56  1281.98
Collection 1 1635.94  410.25  1458.83
Collection 2 75.31  62.64  343.59
Collection 3 394577  13165.34  106366.87
Collection 4 62.5  94.28  219.13
Collection 5 382,88  269.35  570.19
SSRS Warehouse 33685.38  111.11  114.62

 

Database Database size before upgrade Log space used in SIMPLE Log space used in FULL
Configuration 172.75  20.29  303.03
Collection 1 245791.13  8671.32  72119.40
 SSRS Warehouse  38510.88  85.02 85.27

 

Tips for minimizing risk

While upgrading in the FULL recovery model rather than the SIMPLE recovery model does add risk to your ability to successfully upgrade your Team Foundation Server, there are a number of ways to minimize this risk.  For example, it is always a good idea to manually grow the amount of space allocated to your SQL transactions logs as much as possible prior to running the upgrade.  As stated above, if you use SQL AlwaysOn you should also consider disabling users' read access to your secondary databases during the upgrade to maximize the speed of log transfers from primary to secondary.  And finally, it is absolutely crucial to have current backups of all your TFS data prior to performing any type of upgrade.

In addition to these general tips, there are some advanced options that you may want to consider taking advantage of to minimize your risk further.

Advanced Options: Log Backup Mitigation

For Database Mirroring/AlwaysOn Availability Groups we can throw the transaction log backups away by backing them off to a NUL device.  This will keep them from hitting the backup disk which will save us backup disk space, IOPS, and backup disk write latency.  While not quite as nice as the SIMPLE recovery model, it is roughly equivalent in function.  The transaction log files would only have value if we wanted to restore the databases to a point in time during the upgrade, which is a highly unlikely scenario.  Additionally, in order to restore the databases one would have to remove them from Mirroring/AlwaysOn.  A catastrophic upgrade failure that cannot be restarted would warrant a restore from the full backups as one will want to get back into a pre-update state to proceed.  After the upgrade a full backup should be taken again to restore the integrity of subsequent transaction log backups.  

Most systems that do transaction log backups do them on a 15 minute or longer duration.  To minimize the delay in backing off committed transactions we should increase the frequency to every minute.  To do this, it is probably easiest to disable the normal backup job and setup a new one that runs every minute, and goes to a NUL device. 

Advanced Options: Commit Latency Mitigation

There isn’t a lot that can be adjusted with regard to delivering transactions to the failover partner.  We can choose whether the Mirror/AlwaysOn Group will be Asynchronous or Synchronous.  If log or disk space is a concern, it is generally advisable to use Asynchronous delivery.  Under Asynchronous delivery the failover partner/secondary database(s) are allowed to get behind on the delivery of the transactions, and the primary will not delay the commit of the transaction until it has been written at the destination.  This exposes the system to data loss at the destination if there is a catastrophic failure.  This scenario won't make you any worse off that if you had just torn down your database farm from the start, with one caveat: if the failover partner/secondary database(s) goes offline and transactions aren't delivered then it will need to be removed promptly from Database Mirroring/AlwaysOn to avoid unchecked transaction log growth.  Before allowing access to the read-only databases, they should be checked to see if the application of the transactions is behind.  There may be a period where the primary is brought back online to resume operations, but the read-only databases are given more time to finish applying the transactions.  Because the TFS update frequently makes DDL and DML changes to the databases the secondary databases should be fully caught up with the update changes before being brought back into operation.  All the usual methods of improving log performance are still applicable such as growing out the logs for the expected usage prior to running the update, and having the logs on fast disk.

Again, though it's not required, the best way to minimize risk is to run a practice upgrade on a cloned test environment prior to upgrading your production server.  This is especially true if you have never serviced your databases in the FULL recovery model and don't know what to expect. 

Provided that the appropriate cautionary measures are taken, the ability to upgrade in the FULL recovery model should ultimately make server upgrades more efficient and convenient for users with SQL AlwaysOn or database mirroring.

 

What if I want to upgrade in the FULL recovery model, but the upgrade wizard doesn’t present me with the option to do so?

As this is a brand new feature for Update 2, there may be edge cases where the upgrade wizard does not correctly detect that your configuration database is involved in AlwaysOn or database mirroring and therefore does not give you the option to upgrade in the FULL recovery model.  There are also a number of other scenarios that are not formally supported by TFS where a user may want to upgrade in the FULL recovery model.  These include:

  • If your databases are involved in Log Shipping.  Log Shipping is not officially supported for use with TFS, but many users have found that they are able to set up Log Shipping for their TFS databases without issue.
  • You have a simple database but want the option of upgrading in the FULL recovery model to keep your SQL transaction chain unbroken.  However, keep in mind that you should not have any reason to want to restore your databases back to a point in time where they were only partially upgraded.
  • Your configuration database is not involved in AlwaysOn or Database Mirroring, but some or all of your collection databases are.  Since the upgrade process only checks the configuration database and not the collections for these features, it will attempt to convert all databases to the SIMPLE recovery model in this case.  This will cause upgrade to fail as soon as it reaches a collection that cannot be placed into the SIMPLE recovery model.
    • If your TFS instance is set up this way, you should really consider adding the configdb to your database farm.  The data stored in the configdb is essential to TFS.  If data from the configdb is lost or corrupted your TFS instance will probably be irrecoverable.

Thus, we have created a manual override that will allow any user to upgrade in the FULL recovery model, even when the upgrade wizard does not detect the need to.  To use this override, open SQL Management Studio and set the extended property,

“TFS_UPGRADE_IN_FULL_RECOVERY_MODEL = Yes”

for each database you wish to upgrade in the FULL recovery model.  Note that if this extended property is set to anything other than “Yes,” it will be ignored during the upgrade and TFS will default to upgrading all your databases in whichever recovery model the wizard detected it should use.  After the upgrade, you should remove this extended property from your databases so that you do not unintentionally leave it set for any future upgrades.

 

Caveats, Limitations, and Known Issues

  • When upgrading a TFS instance that uses SQL AlwaysOn, please be sure to provide the wizard with your AlwaysOn Group Listener (AGL) rather than to the primary or secondary replica of your configuration database, as the AGL will always point to whichever database is labelled as the primary.  If you are upgrading a database mirror, please be sure to point the wizard to the primary replica of your configuration database.
  • Team Foundation Server does not support having a combination of AlwaysOn databases and mirrored databases on the same instance.  If SQL AlwaysOn is enabled for a particular SQL Server instance, upgrade will not detect databases involved in any kind of mirroring; it will only detect standalone databases and the databases that are involved in SQL AlwaysOn. 
  • As explained above, there is an override you can set to upgrade individual databases in the FULL recovery model if you so choose.  This should unblock upgrades for any AlwaysOn or mirroring cases we do not correctly detect in the wizard as well as for any unsupported scenarios that may need to leverage this feature.
  • This feature also applies to unattended upgrades and Team Project Collection attaches and detaches, as these operations all perform the same type of database servicing as an upgrade using the wizard.  If you don't want your databases left in the FULL recovery model, don't forget to remove them from their replication feature prior to running any of these commands.
  • Remember that there’s no way to guarantee whether you will be able to successfully upgrade using the FULL recovery model without attempting to do so on an identical test instance first.  Prior to performing an upgrade, always ensure you have a current full backup of your TFS data.  If your upgrade fails, you will need to use this backup to restore your data before attempting to upgrade a second time.

1Log Shipping is not a feature officially supported by TFS.  However, it is still possible to upgrade databases involved in Log Shipping by using a manual override.  See “What if I want to upgrade in the FULL recovery model, but the upgrade wizard doesn’t present me with the option to do so?”