SharePoint 2013 Shredded Storage vs. RBS

**THIS IS A REPOST**

It’s important to note that sometimes information needs to be re-posted.. This post was originally created by a peer of mine at Microsoft – Chris Mullendore – on his Microsoft blog. When he moved on, the post disappeared and the rest of us mere mortals FREAKED… so we were able to find the content and I’ve reposted here so that we can all find it again… Thank you, Chris!

SharePoint 2013 has introduced significant enhancements in how file content is stored and retrieved in the database. Specifically, files are now split into many parts ("Shredded") and stored in individual rows in the content database. This is related to the Cell Storage API introduced in 2010 that support(s/ed) things like co-authoring in SharePoint 2010 and the Office Web Applications, but is gaining prominence in SharePoint 2013 because in 2013 because this file-splitting capability has been pushed beyond an over-the-wire transfer capability and now exists in the SharePoint databases and now exists as a feature called Shredded Storage.

There are already more than enough blog entries (https://blogs.technet.com/b/wbaer/archive/2012/11/12/introduction-to-shredded-storage-in-sharepoint-2013.aspx) describing in detail shredded storage itself. Suffice it to say for the purposes of this blog entry all files will be shredded into either 64K or 1MB (1024K) chunks, depending on the type of file we're talking about. File types that SharePoint understands and interacts with directly (Office files) will receive the 64K treatment, other file types will be sliced to 1MB chunks. While I don't know the exact reasons for these sizes, I do believe that they make sense given the purposes of the Cell Storage API, what is likely to be versioned and not, and the difference in use cases for Office documents vs. other files like binary (zip) or large media files.

Where things get really interesting (and where we reach the purpose of this blog) is around how the Shredded Storage functionality impacts or interacts Remote Blob Storage (RBS) which was introduced in SharePoint 2010 and continues to exist in 2013. If you recall, RBS is best used to push relatively large files from being contained directly in the SharePoint content database(s) out into an actual file system. Although any size file can be configured to be pushed (and in fact the default setting is "0", or "push all files to RBS"), smarter people than me have done testing and indicate that the benefits of RBS are most valuable with somewhat larger files while using RBS for smaller files can actually reduce performance.

So… lets think about these two features for a moment…

  • RBS works best with larger blobs.
  • Shredded Storage slices larger blobs into a lot of very small blobs.

Time for a few examples…

  • Scenario 1: Word document, 10K, all text. RBS Threshold = 0K
    This file would be placed in a single shred and pushed to the RBS store.
  • Scenario 2: Word document, 10K, all text. RBS Threshold = 1MB
    The file would be placed in a single shred and would be stored in the database.
  • Scenario 3: Word document, 5MB, all text. RBS Threshold = 0K
    The file would be placed into numerous ~64K shreds, pushed to RBS, resulting in ~80 RBS chunks.
  • Scenario 4: Word document, 5MB, all text. RBS Threshold = 1MB
    The file would be placed into numerous ~64K shreds and would be stored in the database.

Your 5MB file would never make it to RBS because RBS cares about the size of the shred, not the total size of the file!

Uh oh… we have an apparent conflict. Our Word document will never make it to RBS. However, RBS isn't completely negated… it still plays a role in files that SharePoint doesn't provide direct interoperability with. For example, in a Word document SharePoint plays a role in versioning, co-authoring, metadata integration, etc. For other, non-integrated file types however, SharePoint will still use shreds and RBS… just not in the way you might think.

Let's do the same scenarios using a ZIP file instead, and changing the RBS threshold to 1MB:

  • Scenario 1: ZIP File, 10K. RBS Threshold = 0K
    The file would be placed in a single shred and pushed to the RBS store.
  • Scenario 2:  ZIP File, 10K. RBS Threshold = 2MB
    The file would be placed in a single shred and would be stored in the database.
  • Scenario 3:  ZIP File, 10MB. RBS Threshold = 0K
    The file would be shredded to 1MB chunks and pushed to RBS.
  • Scenario 4:  ZIP File, 10MB. RBS Threshold = 2MB
    The file would be shredded to 1MB chunks and pushed to RBS.

Now things should be very confusing. Neither SharePoint nor RBS cooperated with our RBS settings! The curiosities continue in the following example…

Same scenario, but using a Word document with 1MB of text and a single 5MB embedded image for a total of 6MB…

  • Scenario 1: RBS Threshold = 0K
    The file will be split into ~16 ~64K shreds and 5 1MB shreds, all pushed to RBS.
  • Scenario 2:  RBS Threshold = 2MB
    The file will be split into ~16 ~64K shreds which will be stored in the database and 5 1MB shreds pushed to RBS.
  • Scenario 3:  RBS Threshold = 10MB
    The file will be split into ~16 ~64K shreds which will be stored in the database and 5 1MB shreds pushed to RBS.

You should now be either thoroughly confused or noticing a pattern. Here is the basic summary for the behavior:

  • Content that SharePoint understands will be shredded normally and those shreds will be pushed to RBS or not depending on your RBS threshold.
    Content that SharePoint does not directly understand or interoperate with will always be shredded to 1MB chunks and will always be pushed to RBS if RBS is enabled. Your RBS threshold will be ignored.
    Content that SharePoint understands will be broken down will be recognized during the shredding process and the various pieces will be one of the two above types of content above and will be handled according to the rules of that media type.

Believe it or not, this does actually make sense… but to make sense of it we have to understand the critical differences between the two file types.

  • Office documents are…
    • ...likely to be heavily versioned.
    • …likely to have only a small portion of the content change at each version.
    • …reliant on SharePoint/Office ability to do things like co-authoring that are much faster if only the changes/deltas are saved or transferred over the wire.
  • Non-integrated files (images, media, ZIP, etc.) are…
    • … likely to be replaced entirely if versioned. Incremental or cell-based updates are impossible.
    • …not deeply integrated into SharePoint itself, and SharePoint does not alter the contents or attributes of those files.
    • …typically managed through applications that do not understand or directly interact with SharePoint.
      (For the purposes of this conversation, any file type that SharePoint does not provide direct interoperability with should be considered "non-integrated". For example, a large TXT file uploaded into SharePoint will be managed according to the non-integrated rules despite the fact that it is still a text based format.)

Essentially, (and I am taking an educated guess here) the SharePoint product group looked at the significant difference between these two types of files and made their own determination about what was best for each of them. Office documents shredding would focus on many small, incremental changes between versions that could easily be transferred independently to/from the SharePoint server. Binary files shredding would focus on reasonable sizes that provide a reasonable balance between the benefits and the cost of breaking and re-assembling the binary BLOBs through RBS. Further, the PG went so far as to go inside of Office documents, separating out the editable (and therefore highly likely to change) text from any embedded binary (and therefore either likely to not change, or change completely) elements and is managing both types of content appropriately even as it appears in the same file.

What should you do about this?

First, we need to be clear about the goals and situations that each solution does well in and is intended to address:

  • Shredded storage defaults to 64K for integrated files and 1MB for non-integrated files and reduces SQL database growth caused by file versioning.
  • RBS works better with fewer, larger files, and moves files out of the database.

Clearly keeping the SQL database file size down is a good thing (and is probably why you deployed RBS to begin with). Otherwise, our primary goal is to find a balance between how SharePoint operates and how RBS is configured.

The answer is simple: Set RBS to a 1MB threshold.

This setting is consistent with how SharePoint operates when RBS is enabled and with previous guidance on how to configure RBS such that only large files are pushed to the RBS store. Configuring the RBS threshold to 1MB aligns the environment with SharePoint's default behavior, but more importantly it ensures that the small 64K shreds are not pushed to the RBS store. Given SharePoint's behavior, this is the best balance one can strike in the RBS vs. Shredded Storage design. This lets the small, integrated file shreds continue to remain in the database where differential versioning can be effective while maintaining consistency with the large files that are likely to be pushed to RBS by shredded storage anyway.

The side effect to this configuration is that your databases will grow faster than if all files were pushed to RBS, and this may cause some concern for some people. However, this configuration can dramatically increase the actual storage density inside of the database, particularly for heavily versioned files, while avoiding the performance overhead of retrieving thousands of 64K files through RBS for a single document. It really is the perfect balance… the best of both worlds… a good thing.

Oh… about that FileChunkWriteSize thing…

Yes, you can force shredded storage to behave as if it were disabled (even though it's not) by modifying the FileChunkWriteSize property to some incredibly large value such as 2GB. This would have the intended effect of forcing SharePoint into storing the entire file in a single shred which would then exceed your RBS threshold and SharePoint would dutifully push the file to RBS as desired. However, modifying this setting comes with some significant side effects. For example, by setting the FileChunkWriteSize to a large value you are defeating the opportunity for SharePoint to increase your storage density for versioned files. Think of it this way: The larger the chunk size, the larger the block we store is going to be, and the fewer blocks there will be. The fewer blocks there are, the fewer identical blocks there can be and the lower your storage density.

Lets go for one of those scenarios again for demonstration. Imagine we have a 100MB file that has a single value updated daily (Excel Services can easily create this kind of scenario) in a library with versioning enabled and no maximum version count (a frighteningly common configuration). That means 365 updates over the course of a year.

  • With FileChunkWriteSize = 2GB total storage would be 100MB x 365 days = 36,500MB (or 35.6GB!)
  • With FileChunkWriteSize = 20MB total storage would be 100MB + (20MB x 364) = 7,380MB (or 7.2GB)
  • With FileChunkWriteSize = 64K (the default value) total storage for the file would be 100MB + (64KB x 364) = 122.75MB

The net effect being that leaving that file in the database and allowing shredded storage to do its job will actually reduce storage growth over a single year by over 99% without incurring any of the overhead of pushing that file to RBS! Remember that the purpose of RBS is not to reduce database size… it is to move the non-transactional data to less expensive storage as a means of reducing storage cost. Shredded storage can be better for highly versioned content than RBS because it reduces actual storage used. Depending on your scenario and content, not consuming storage at all can be much more cost effective than consuming a lot of somewhat less expensive storage.

Ultimately, increasing FileChunkWriteSize is the complete opposite of value. Just say no.

So in the end, my recommendation is "Let SharePoint Be SharePoint" and let shredded storage do its job in collaboration with RBS. They both have a place in the world, and they both work together well. SharePoint will own the highly integrated, highly versioned content, and SharePoint will let RBS own the less integrated, monolithic content. End users stay happy, file save requests stay fast, and databases grow much slower than in any previous version of SharePoint.

Edited September 18, 2013 to reflect updates from Bill Baer's recently published whitepaper on shredded storage, available here.