File Access and table-valued functions

In
this article, we will discuss a sample CLR function that illustrates
accessing an external resource such as a file from inside a SQL Server
stored procedure or function, . The sample highlights a number of
different features and concepts, such as file-access, table-valued
functions and impersonation.

Let
us review first how to implement a table-valued function. If you have
been playing with table-valued functions (TVFs) in SQL Server 2005 beta
2, you will notice that the contract is much simplified in the April
CTP of SQL Server 2005. The basic requirements are as follows:

The
TVF should be mapped to a static method in the CLR (let us call it the
root method) that returns either an IEnumerator or an IEnumerable
interface. The IEnumerable interface is supported mainly to support
scenarios where you want to crack collections (such as arrays, lists)
into tabular result sets. It is very straightforward to implement such
TVFs because the managed classes that implement these collections
already implement the IEnumerable interface. The IEnumerable interface
has a single method called GetEnumerator that returns an IEnumerator
interface. For cases where you’re not cracking such a collection, you
can implement the IEnumerator interface itself (instead of the indirect
route of implementing IEnumerable that returns an IEnumerator). The
IEnumerator interface requires one property and two methods that need
to be implemented. Following are the descriptions of these methods
along with how SQL Server calls them.

  • MoveNext: This
    method moves the current position of the “result-set” to the next row.
    The initial position of the “result-set” is before the beginning, so
    the consumer (in this case SQL Server) will always call MoveNext first
    which positions it on the first row of the result set.
  • Current property: This property returns the current “row” of the result set as an Object as per the current row position
  • Reset: This method resets the current position to before the first row.

The root method
should be annotated with a SqlFunctionAttribute that indicates what the
fill-row method is (typically another static method in the same class
as the root method). The fill-row method cracks the record represented
by the Object returned by IEnumerator.Current into multiple column
values using output parameters. Thus, the fill-row method has 1+n
arguments where n is the number of columns returned by the TVF. The
first argument is an input argument which is the Object that SQL Server
got from calling IEnumerator.Current. The next n arguments are output
arguments that the method should fill with the column values; the data
types of these arguments should be compatible with the column data
types as declared by the CREATE FUNCTION statement.

Having reviewed the
basic elements of building a TVF, let us look at our example scenario.
The sample implements a table-valued function called GetFile, that
given the path-name for a file, reads the contents of the file and
returns it as a varbinary(max) value as a single row with a single
column. You might wonder why this is not written as a scalar-valued
function since after all it just returns a single scalar. The reason is
performance: whenever dealing with large values, it is of course
important to try and stream the values to SQL Server instead of
buffering the entire contents along the way. This means using the
SqlBytes data type as the return type of the function (well, actually
the type of the single column returned by the function). SqlBytes
allows streamed movement of binary data to SQL Server. In our case, we
will construct a SqlBytes object using a managed FileStream object that
represents the contents of the specified file. However we need a way to
close this FileStream once SQL Server is done reading the contents of
the file. Since SqlBytes does not implement the IDisposable interface,
there is no way for SQL Server to close the file. So if we implemented
this function as a scalar function, there will be no way to close the
FileStream and we will have an open file handle even after the query
that reads the file contents completes execution. On the other hand, if
we model this function as a table-valued function, then the MoveNext
method (which is part of the IEnumerator interface that needs to be
implemented for the TVF) can close the FileStream once it reaches the
end of the result-set (which in this case is just one row).

Having established
the need for a table-valued function, let us review our implementation.
The implementation consists of two classes: FileAccess which implements
the root method (GetFile) and the fill-row method (called of course
FillRow). The GetFile method returns a SingleFileLoader object that
implements the IEnumerator interface.

The SingleFileLoader
is constructed using the specified file name which it opens stashing
away the FileStream in its private state. (It also impersonates the
current caller before it opens the file; more on this later). MoveNext
is coded such that it always returns only one row. When it reaches past
this one row it closes the FileStream (the main reason why we chose the
TVF approach). Current simply returns the FileStream that was created when the SingleFileLoader was constructed.

Here is the code (which will work only in the April CTP of SQL Server 2005 or later).

 using System;
using System.Collections;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Data.SqlTypes;
using System.Security.Principal;
 // This class implements a table-valued function that returns one row with one 
// column that retrieves the binary contents of a specified input file name.
// Returns NULL on most exception conditions such as inability to impersonate, 
// file-doesn't exist etc. 
 public partial class SingleFileLoader : IEnumerator
{
    private FileStream fs;
    private bool IsBegin = true;
    private String fn;
     public SingleFileLoader(String FileName)
    {
        fn = FileName;
         SingleFileLoaderHelper();
    }
     private void SingleFileLoaderHelper()
    {
        WindowsImpersonationContext OriginalContext = null;
        try
        {
            //Impersonate the current SQL Security context
            WindowsIdentity CallerIdentity = SqlContext.WindowsIdentity;
             //WindowsIdentity might be NULL if calling context is a SQL login
            if (CallerIdentity != null)
            {
                OriginalContext = CallerIdentity.Impersonate();
                fs = new FileStream(fn, FileMode.Open);
            }
            else fs = null;
        }
        catch
        {
            //If file does not exist or for any problems with opening the file, 
            // set filestream to null
            fs = null;
        }
        finally
        {
            //Revert the impersonation context; note that impersonation is needed only
            //when opening the file. 
            //SQL Server will raise an exception if the impersonation is not undone 
            // before returning from the function.
            if (OriginalContext != null)
                OriginalContext.Undo();
        }
    }
     public Object Current
    {
        get
        {
            return fs;
        }
    }
     public bool MoveNext()
    {
        //Ensure returns only one row
        if (IsBegin == true)
        {
            IsBegin = false;
            return true;
        }
        else
        {
            //Close the file after SQL Server is done with it
            if (fs != null) fs.Close();
            return false;
        }
    }
     public void Reset()
    {
        IsBegin = true;
        SingleFileLoaderHelper();
    }
}
 
public partial class FileAccess
{
    [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "FileContents varbinary(max)")]
    public static IEnumerator GetFile(String FileName)
    {
        return new SingleFileLoader(FileName);
    }
     public static void FillRow(Object obj, out SqlBytes sc)
    {
        //If non-existent file, return SQL NULL
        if (obj != null) sc = new SqlBytes((Stream)obj);
        else sc = SqlBytes.Null;
    }
}

Assuming that this code is compiled into an assembly called TVF.dll,
the assembly and the table-valued function can be registered in SQL
Server by executing the following DDL:

 use MyDB
go
 if exists (select * from sys.objects where name = 'GetFile')
      drop function GetFile
if exists (select * from sys.assemblies where name = 'TVF') 
      drop assembly TVF
go
 create assembly TVF
from 'c:\Projects\TVF\TVF\TVF.dll'
with permission_set = external_access
go
 create function GetFile(@fn nvarchar(300))
returns table(filecontents varbinary(max))
as external name TVF.FileAccess.GetFile
go

Following are some simple test queries to test the TVF:

 select *
from GetFile('C:\MyDir\function1.cs')
go
 select *
from GetFile('NonExistentFile.cs') – should return NULL
go

Impersonation
As
you might have noticed, there is code to impersonate the current
security context in the constructor for SingleFileLoader - actually the
bulk of the work is done by SingleFileLoaderHelper. SQL Server does not
automatically impersonate the current caller when executing code in an
EXTERNAL_ACCESS or UNSAFE assembly (it is not possible to access an
external resource in the SAFE assembly, so impersonation is moot for
that case). Hence by default, code will run and access resources under
the SQL Server service account unless there is explicit code to
impersonate. In this example, it is sufficient to impersonate only when
the file is opened. Make sure to Undo the impersonation (done in the
finally clause) since SQL Server will raise an error if the
impersonation has not been reverted before execution returns to SQL
Server.

The way you retrieve
the current calling context is by calling SqlContext.WindowsIdentity.
This might return NULL if the calling context is not an NT
Authenticated login; in this example, we simply return a NULL value for
the function when impersonation fails in this manner.

So what about OpenRowset(BULK)?
You
might wonder why should we write this in the CLR, while SQL Server 2005
natively supports the OpenRowset(BULK ‘filename’, SINGLE_BLOB) syntax
that does exactly the same thing. While this specific scenario is
available natively, there are other such things that you can write in
CLR with more flexibility that you cannot do with OpenRowset(BULK); for
e.g. retrieve all files in a directory that have a specific
extension (I might post this sample in a future entry). You could
also choose not to impersonate or to impersonate a fixed security
context in the CLR based implementation while with OpenRowset(BULK) you
always get impersonation of the caller.

The next question I
can imagine someone asking is how this performs compared with the
native OpenRowset(BULK). My unscientific measurements on the April CTP
(IDW 14) reveal that the CLR TVF is slower than OpenRowset(BULK).
However, there has been some recent performance work on improving the
transition cost coming back from managed code to SQL Server which
should bring the performance of the CLR TVF to be close to that of
OpenRowset. This performance improvement will be available in the next
CTP of SQL Server 2005.

- Balaji Rathakrishnan
Microsoft SQL Server

This posting is provided "AS IS" with no warranties, and confers no rights.