Blob Size Never Reduces When Using RBS with SharePoint 2010

So last week I had a customer let me know that after configuring RBS (remote blob storage) with their SharePoint Server 2010 environment they noticed that the blob size was not reducing after they would delete a document from the SharePoint library. So we begin investigation into what it takes to make this happen.

Initially I thought I had discovered and repro’ed a bug, er undocumented feature, as I was seeing the same behavior that my customer was. The setup was very simle and as follows:

  • SQL Server 2008 R2 as the database backend
  • SharePoint Server 2010 (in my case a single server serving all roles

There are several blogs and technet articles on how to configure RBS with SharePoint Foundation/Server 2010 so I won’t re-hash all of that here. If you need those links, then here is one to technet:

Install and configure RBS (SharePoint Foundation 2010)

After configuring this properly (a list of pitfalls is at the end of this entry), I could upload a file greater than 100k and see a corresponding file in the blobstore folder on the SQL Server. All is good. However after deleting the document from SharePoint it would not be removed from the blobstore. I did a lot of digging and the first thing we came upon was the RBS Maintainer, but that by itself was not enough; the files were still not being removed from the blobstore.

At its basest level, the Microsoft RBS solution is using SQL’s FileStream for data storage so we needed to look up how FileStream gets cleaned up – or garbage collected. I researched several blogs (hopefully I have them all listed at the end of this entry) on this and feel as if I have a much better understanding at this point. Basically SQL has to know that the file (which represents a blob of data) is not going to be needed again before it will clean up blob files. That requires transaction log backups (which don’t happen automatically) and checkpoints (which do happen automatically).

So taking this information in hand and adding RBS on top of it, we get a list of things required for blob files to be removed from disk after a list item is deleted from SharePoint:

  1. The item must be removed from both recycle bins in SharePoint (or not placed there to begin with).
  2. The RBS Maintainer must be run twice (once to mark for deletion and the second time to actually delete the references – make note of Maintainer configuration options such as garbage_collection_time_window).
  3. The content database must have more than one transaction log backup and one ore more checkpoints after the Maintainer has removed all references.

Does this mean you can force the above and get immediate removal of files on disk? Almost… but why? If you use Task Scheduler to schedule the Maintainer to run periodically and you have regularly scheduled transaction log backups (which you should anyway), then the rest of this should happen auto-magically. If you are finding that files aren’t getting cleaned up as quickly as you’d like, then you may want to look at your recycle bin settings and the Maintainer configuration options.

 

Pitfalls:
  1. Initially during our setup of RBS we ran into MSI failures attempting to install RBS on the web front end (WFE). The link in some of the technet documentation is incorrect. The correct link (as of this writing) is https://www.microsoft.com/download/en/details.aspx?displaylang=en&id=16978. Search the page for “remote blob store” and click the appropriate link. After installation, the version will be 10.50.1600.1 as shown in Control Panel/Programs and Features.
  2. Powershell command windows have caching that can cause issues if you are not aware. During troubleshooting we had left a powershell window open and active over several days of working on this issue and as soon as we opened a new window things fell into place.
References used:

https://www.sqlskills.com has a great series of articles on SQL FILESTREAM and a big thanks to Paul Randal for his write-ups that allowed me – a non-SQL resource – to get a better handle and understanding of FILESTREAM so that I could proceed with figuring out RBS with SharePoint 2010:

https://www.sqlskills.com/BLOGS/PAUL/post/FILESTREAM-garbage-collection.aspx

https://www.sqlskills.com/BLOGS/PAUL/post/FILESTREAM-directory-structure.aspx

There were several professional blogs that I used with regards to RBS setup:

https://blogs.msdn.com/b/priyo/archive/2010/04/27/configure-remote-blob-storage-rbs-with-filestream-for-sharepoint-foundation-2010.aspx

https://alipka.wordpress.com/

https://www.toddklindt.com/blog/Lists/Posts/Post.aspx?ID=174

Technet & MSDN:

https://blogs.msdn.com/b/psssql/archive/2008/01/15/how-it-works-file-stream-the-before-and-after-image-of-a-file.aspx

https://blogs.msdn.com/b/psssql/archive/2011/06/23/how-it-works-filestream-rsfx-garbage-collection.aspx

https://msdn.microsoft.com/en-us/library/cc949109.aspx

https://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx

https://msdn.microsoft.com/en-us/library/ms189573.aspx