RBS Garbage Collection Settings and Rationale

I recently got a question on RBS Garbage Collection settings and their usage. So I decided to write this blog post describing the different parts of GC and the associated settings.

 

RBS Maintainer does garbage collection (GC) in 3 phases:

1. Reference Scan (RS) - Look through the application tables and find blobs that are no longer referenced by the application. The list of registered RBS columns is used for this purpose. BlobIds must not be stored in any place other than the registered columns. The blobs that are no longer referenced by the application are marked to be deleted.

2. Delete Propagation (DP) - Blobs marked for deletion are actually deleted from the blob store. There is a gap between when the blobs get marked for deletion and when they are actually deleted. This gap duration can be configured using the "garbage_collection_time_window" config item and defaults to 30 days. The reason for having this GC time window is to allow restoring old backups of the RBS database. Backups as old as the time window (e.g. 30 days) can be restored and all blobs that were referenced by that database are guaranteed to be present in the blob store. If we had deleted the blobs immediately, restoring an old backup will lead to dangling pointers (some blobs referenced by the application are not present in the blob store). Having this gap ensures that if an old backup of the database is restored, blobs that were referenced by the application at the time the backup was taken (but deleted by the application later) are still present in the blob store. For this reason, this config item must be set to the SLA time period for backup/restore.

3. Orphan Cleanup (OC) - All blobs in the blob store are enumerated and we compute the list of blobs that are present in the blob store but are not known to RBS. These blobs are "orphans" and can be caused due to aborted transactions, application misbehavior or other failures. Orphan blobs created before the GC time window are deleted from the blob store.

In addition to the GC time window setting, there are 2 more config items related to GC: "delete_scan_period" is the time period for running one scan of RS and DP phases of GC. After one pass of RS and DP is completed, attempting to run them again within this time period will just skip RS and DP and do nothing. Similarly, "orphan_scan_period" is the period for the OC phase of GC. These are also 30 days by default.

These settings can be set by calling rbs_sp_set_config_value. The format for these config items is: 'days n' where n is a positive number. For testing purposes, it can also be set to sub-day durations using the format 'time hh:mm:ss'. A smalldatetime field is used internally, so the precision of this setting is 1 minute. It can also be set to 0 using 'time 00:00:00' .

The actual work of GC is done by the RBS Maintainer application. The maintainer is a console application that takes command line parameters such as the connection string to the database and the phases of GC to execute. This can be run from any machine that has access to the DB and the blob store(s). It can also be run from multiple machines simultaneously. You can schedule it using your favorite scheduler e.g. Windows Task Scheduler.

Maintainer also takes an optional parameter to limit the amount of time it is run. Here is a scenario showing how this time limit can be used in conjunction with the scan_period settings above for a GC schedule:

· Maintenance window in production environment is 2 hours every day from 2 AM to 4 AM.

· A complete GC pass takes 6-10 hours to run.

· Customer wants to run one GC pass every week (7 days).

Solution:

· Schedule a Maintainer.exe task to run every day at 2 AM, and include the command line option " -TimeLimit 120" - this will stop maintainer after running for 2 hours, even if GC pass is not complete. The time limit is specified in minutes.

· On the database, set the RBS config values of delete_scan_period and orphan_scan_period to 7 days.

This way, Maintainer.exe will run on Mon-Thu (assuming 7 hours run time) and then do nothing on Fri-Sun instead of starting a new GC scan. Next Monday, a new GC scan is started.

For testing purposes, you can do the following to verify that blobs are getting deleted from the blob store:

1. Set garbage_collection_time_window and delete_scan_period to 'time 00:00:00'

2. Delete BlobIds from the application table

3. Run Maintainer.exe, specifying RS and DP phases

Blobs should get deleted from the blob store at this point.

 

- Pradeep