Zach and I just spent a couple of days figuring out how to make use of the new FILESTREAM support in SQL Server 2008 and we thought we’d share a little bit about the experience in hopes it might save somebody some time.
There’s a ton of information out there regarding FILESTREAM, but in case you need more detail you can check out some of the SQL Server 2008 CTP content on Connect. FILESTREAM support will be enabled in CTP5. From the perspective of user experience, FILESTREAM is going to enable some interesting scenarios.
SQL Server has always provided the capability to store binary data, and thus you could grab any type of file and stuff it into a SQL Server varbinary(max) column. But blobs have different usage patterns than relational data, and SQL Server’s storage engine is primarily concerned with doing I/O on relational data stored in pages and extents, not streaming blobs.
So the bottom line is that storing blob’s in SQL Server have always had some limitations from a performance perspective. Most developers resorted to storing files in the file system, then just storing a path to the file in the database. Perfectly legitimate approach, and for some apps may still be the right way to go even with the advent of FILESTREAM.
But when you do that you introduce a whole new set of challenges around manageability, backup and concurrency that should already be obvious to the reader. Enter FILESTREAM.
Now SQL Server 2008 can store blobs in its own private namespace on the local NTFS filesystem instead of in-line with relational data. That’s good cuz the NTFS file system was built to stream blobs. NTFS is even more interesting form a database perspective because it’s transactional and supports recovery. So you can imagine the SQL Server Storage Engine and NTFS having a little mutual self-respect love fest.
Again this has obvious advantages from a manageability, backup and concurrency perspective that I won’t reiterate here. But what I will do is talk about our first experience with FILESTREAM from a database perspective, and Zach’s gonna talk about it from an application development perspective.
Our first project was to wire up a rich WPF app written in Visual Studio 2008 with some video stored in FILESTREAM in a pre-release build of SQL Server 2008 CTP5. We quickly found out that you can’t connect to CTP5 from managed code written in Visual Studio 2008 Beta 2 due to a TDS compatibility issue. Once we tracked down a release candidate of Visual Studio 2008 we were good to go.
In order to use the FILESTREAM attribute on a varbinary(max) column you have to enable FILESTREAM support for the SQL Server instance using:
EXEC [sp_filestream_configure] @enable_level = 3;
At this point I got a message that I needed to reboot the server in order for the setting to take effect. The dev team is working on removing this requirement for RTM.
Next I created the database. In order to use FILESTREAM you have to have a special filegroup for storing the FILSTREAM data which I called FileStreamGroup1:
CREATE DATABASE AdventureWorksRacing ON PRIMARY
( NAME = AdventureWorksRacing_data,
FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksRacing_data.mdf’,
SIZE = 2MB,
MAXSIZE = 50MB,
FILEGROWTH = 15%),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM
( NAME = AdventureWorksRacing_media,
FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksRacing_media’)
( NAME = AdventureWorksRacing_log,
FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksRacing_log.mdf’,
SIZE = 5MB,
MAXSIZE = 25MB,
FILEGROWTH = 5MB);
When I went out to the AdventureWorksRacing_media folder there was some initial NTFS folders with GUID’s for names, and some header files and log folders. This is all storage engine gobbledy-goop for creating the FILESTREAM namespace.
Next I created a table to store our video with a FILESTREAM column:
CREATE TABLE [dbo].[eventMedia] (
[mediaId] [uniqueidentifier] NOT NULL ROWGUIDCOL PRIMARY KEY,
[dateCreated] [datetime] NOT NULL DEFAULT(GETDATE()),
[createdBy] [nvarchar](256) NOT NULL,
[fileName] [nvarchar](256) NOT NULL,
[mediaType] [nvarchar](256) NOT NULL,
[location] [geometry] NULL,
[file] [varbinary](max) FILESTREAM);
Tables with FILESTREAM columns required a ROWGUIDCOL column. This is used by the storage engine to keep track of instances in the filesystem. Next I wrote some TSQL to insert some garbage to see the affect in the AdventureWorksRacing_media folder. I’m not showing that code here because every lame FILESTREAM sample shows it and it’s worthless and makes me crazy. The correct way to insert data is using a native or managed client that can actually put real binary data into the column as opposed to a hello world string cast as binary data.
Anyway I digress…
After inserting some garbage I saw some new folders and files:
Next I tried to delete some rows and see the affect on the file system. Nothing changed right off the bat, and when I talked to the devs about it they said that storage gets freed up on a filestream filegroup when a valid log truncation point occurs. Otherwise you wouldn’t get proper backup/restore behavior. This made perfect sense to me!
One quick word of warning, trying to open up a query results grid in SSMS on gobs of filestream data is not a good idea. Take it from me. As usual, our tools are powerful and flexible enough to enable a dummy like me to shoot myself in the foot.
Anywho we were up and running with a database to store our videos. Next up Zach’s gonna talk about how we wired our WPF client up to it to play videos in this post.