Best Practices on FILESTREAM implementations

Hello all,

Recently I was asked to advise a customer on a FILESTREAM implementation he was devising, and part of that ended up becoming this blog post that I hope can be of some help for you.

When to use FILESTREAM?

In SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage. So, in a nutshell, these are the typical scenarios where this feature is best suited:

  • If you are storing BLOBs with an average size of 1MB or more.
  • When fast read access is adamant.
  • When you must access BLOBs from your application's middle tier code.

Overall, the main advantages of using FILESTREAM over other options are:

  • Storage and retrieval of BLOBs is made together with relational data in a single data store.  
  • Yes, the BLOBs are included in database backups and restores.
  • Inserting, updating and deleting BLOBs and your relational data happens in the same database transaction.
  • The 2 GB max size for a varbinary(max) column does not apply; you are only limited by the available space on the NTFS file system.
  • The SQL Server buffer pool memory is not used to operate on BLOBs; in previous versions large BLOBs could consume a lot of this memory.
  • All BLOB access can be performed from .NET code, allowing your middle tier code to easily work with these objects.
  • The NTFS file system can save and retrieve large BLOBs more efficiently than SQL Server.

Still, a few restrictions apply to FILESTREAM implementations:

  • FILESTREAM feature is not supported with Database Mirroring.
  • FILESTREAM data containers cannot share the same directory or be nested.
  • FILESTREAM data is not encrypted even when transparent data encryption (TDE) is enabled.

For smaller objects (less than 1MB), storing varbinary(max) BLOBs in the database can provide better streaming performance, as seen in the chart below, from Paul S. Randal White Paper on FILESTREAM storage. I highly recommend starting with this if you are assessing the implementation of FILESTREAM in your infra-structure. Also, another good place to start is the White Paper on Managing Unstructured Data with SQL Server 2008.

image_thumb[2]

Anyhow, the following table (from the same source) provides a nice overview on the benefits and drawbacks of the several methods of handling BLOBs from SQL Server 2008 onwards:

File server / file system SQL Server (using varbinary(max)) FILESTREAM
Maximum BLOB size NTFS volume size 2 GB – 1 bytes NTFS volume size
Streaming performance of large BLOBs Excellent Poor Excellent
Security Manual ACLs Integrated Integrated + automatic ACLs
Cost per GB Low High Low
Manageability Difficult Integrated Integrated
Integration with structured data Difficult Data-level consistency Data-level consistency
Application development and deployment More complex More simple More simple
Recovery from data fragmentation Excellent Poor Excellent
Performance of frequent small updates Excellent Moderate Poor

 

Best Practices for FILESTREAM performance

To squeeze every ounce of performance from BLOB implementations using FILESTREAM requires a few extra steps. Here are some of the best practices which can be leveraged for that purpose:

  1. When a table contains a FILESTREAM column, each row must have a non-null unique row ID. If planning to use a GUID, use the NEWSEQUENTIALID() instead of NEWID() because it renders better performance.
  2. Always create the FILESTREAM data container which will be used by the FILESTREAM enabled database in a separate volume.
  3. Periodically perform disk defragmentation of the volumes which store FILESTREAM data.
  4. Check if FILESTREAM directory containers do not have more than 300,000 files individually, as NTFS performance degradation may be an issue especially when generation of 8.3 filenames is enabled.
  5. Use the same NTFS block size recommendation as the Data and Log files volumes: format disk volumes which will be used to store FILESTREAM data with a 64-KB allocation (the default NTFS allocation is 4-KB). This will effectively disable any possibility of NTFS compression , as it can only be enabled when the NTFS cluster size is 4-KB or less.
  6. Check that the FILESTREAM directory containers do not have file system Encryption or file system Compression enabled as these can introduce a level of overhead when accessing these files. Absolute need to use file system Compression requires advanced planning, namely because compression is only enabled in 4-KB formatted volumes.
  7. Check that search indexing is not enabled on FILESTREAM volumes, under the Volume Properties window, unchecking the “Allow files on this drive to have contents indexed in addition to file properties” box.
  8. Disable generation of 8.3 names on all NTFS volumes used for FILESTREAM data storage by running the below command at an elevated command prompt:
    image
  9. Disable last file access time tracking in NTFS volumes used for FILESTREAM data storage by running the below command at an elevated command prompt:
    image
  10. Configure AV exclusions and make sure the antivirus does not scan SQL Server related processes and *.mdf/*.ldf/*.ndf/*.trn/*.bak or their paths as well as the path where FILESTREAM is stored. If antivirus scanning is absolutely necessary on the FILESTREAM path, avoid setting policies that will automatically delete offending files. If the files are deleted it will corrupt your FILESTREAM enabled database.
  11. As for Write-Through from Remote Clients (if used), because remote file system access to FILESTREAM data is enabled over the Server Message Block (SMB) protocol, use an ~60-KB SMB buffer size (or multiples) when streaming FILESTREAM data back to the client, so that the buffers don't get overly fragmented as TCP/IP buffers are 64-KB.
  12. Similarly, because partial updates of FILESTREAM data creates new files, always aggregate small updates into larger batches to reduce churn.
  13. Avoid Transact SQL statements that will update or append data to the FILESTREAM BLOBs. This action causes the FILESTREAM data to be loaded into the TempDB database and then loads the data back into a new physical file.
  14. Altitude check (order within the minifilter stack) for the SQL Server FILESTREAM filter driver [e.g. rsfx0100.sys]. Evaluate all the filter drivers loaded for the storage stack associated with a volume where the FILESTREAM feature stores files and make sure that rsfx driver is located at the bottom of the stack. You can use the FLTMC.EXE control program to enumerate the filter drivers for a specific volume at an elevated command prompt:
    image
    If not, engage your storage driver vendor about this issue.
  15. TempDB database should be configured to use multiple data files and all the files should be configured to a suitable size as per your usage to improve performance.

I still have a few more notes on this subject, but those will wait for a new post in a few days.

Other material on the subject:
FILESTREAM Best Practices
The SQL Swiss Army Knife #4 - Making sense of FILESTREAM containers

Until next time!

Disclaimer: I hope that the information on these pages is valuable to you. Your use of the information contained in these pages, however, is at your sole risk. All information on these pages is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Ezequiel. Further, Ezequiel shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.