Yet another TVF: returning files from a directory


In my previous BLOG entry dated May 5th, I promised another sample that retrieves multiple files from a directory based on a specified name pattern. Here it is, this time in VB.NET.


This is a more traditional Table-Valued-Function in that it actually returns multiple rows, each with two columns (file-name and contents). Please refer to the earlier entry on the basic requirements of a TVF. Here, I will devote to describing the implementation of this particular TVF.


As we saw in the previous entry already, the key is to implement the IEnumerator interface with three members. The IEnumerator interface is implemented by the MultiFileLoader class. There is some initial work done during the construction of the MultiFileLoader class itself. The constructor retrieves the list of file-names  of the specified pattern in the specified directory and loads it in the array FileNames. Note that the opening of the directory is done after impersonating the caller (and un-impersonating once we are done).


MoveNext – Advances the position in the array. Makes sure to close the currently opened filestream before returning.


Current – Opens the File as per the file-name in the current position in the FileNames array and returns the FileStream; using Filestream ensures streamed access without having to buffer the entire contents in memory; we also make sure to impersonate the caller before opening the file.


Reset – simply resets the array position to before the start.



Here is the VB.NET code:



Imports System
Imports System.Data
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.IO
Imports System.Collections
Imports System.Security.Principal


‘ The record that holds each row returned by the TVF
Partial Public Class FileRecord
    Public FName As String
    Public FContents As FileStream


    Sub New(ByVal fn As String, ByVal fc As FileStream)
        Me.FName = fn
        Me.FContents = fc
    End Sub
End Class


‘The core of the TVF’s implementation that implements IEnumerator
Partial Public Class MultiFileLoader
    Implements IEnumerator


    ‘ Array that holds the list of file names to be returned
    Private FileNames As String()


    Private CurrentFile As Integer = -1
    Private CurrentFileStream As FileStream
    Private CallerIdentity As WindowsIdentity


    ‘ The constructor retrieves the list of file-names into Me.FileNames; it does the
    ‘ appropriate impersonation and un-impersonation after the directory has been read.
    Sub New(ByVal PathName As String, ByVal Pattern As String)


        Me.CallerIdentity = SqlContext.WindowsIdentity
        Dim OriginalContext As WindowsImpersonationContext = CallerIdentity.Impersonate()


        Try
            Me.FileNames = Directory.GetFiles(PathName, Pattern)
        Catch
            Me.FileNames = Nothing
        Finally
            If Not (OriginalContext Is Nothing) Then
                OriginalContext.Undo()
            End If
        End Try
    End Sub


    ‘ Retrieves the FileStream for the current file as pointed to by the CurrentFile array index.
    ‘ Constructs a FileRecord with the current file’s name and the contents (as FileStream) and
    ‘ returns it.
    ‘ Impersonates caller before opening the file and unimpersonates after getting the filestream.
    Public ReadOnly Property Current() As Object _
        Implements IEnumerator.Current
        Get
            Dim OriginalContext As WindowsImpersonationContext = CallerIdentity.Impersonate()


            Try
                Me.CurrentFileStream = New FileStream(FileNames(CurrentFile), FileMode.Open)
            Catch
                Me.CurrentFileStream = Nothing
            Finally
                If Not (OriginalContext Is Nothing) Then
                    OriginalContext.Undo()
                End If
            End Try
            Return New FileRecord(FileNames(CurrentFile), CurrentFileStream)
        End Get
    End Property


    ‘Advances the CurrentFile position in the FileNames array.
    ‘Also makes sure to close the currently open FileStream before advancing to the next File.
    Public Function MoveNext() As Boolean _
        Implements IEnumerator.MoveNext


        ‘ If there are no files, indicate end of result set
        If FileNames Is Nothing Then
            Return False
        End If
        If (CurrentFile < FileNames.Length – 1) Then
            CurrentFile = CurrentFile + 1
            If Not (CurrentFileStream Is Nothing) Then
                CurrentFileStream.Close()
            End If
            Return True
        Else
            If Not (CurrentFileStream Is Nothing) Then
                CurrentFileStream.Close()
                Return False
            End If
        End If
    End Function


    Public Overridable Sub Reset() _
        Implements IEnumerator.Reset
        CurrentFile = -1
    End Sub
End Class


Partial Public Class MultiFileAccess
    ‘ Root method that returns MultiFileLoader which does the bulk of the work.
    <Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName:=”FillRow”, TableDefinition:=”FName nvarchar(300), FContents varbinary(max)”)> _
    Public Shared Function GetFilesInDirectory(ByVal PathName As String, ByVal Pattern As String) As IEnumerator
        Return New MultiFileLoader(PathName, Pattern)
    End Function


    ‘ The fill row method that cracks the FileRecord and returns the individual columns.
    Public Shared Sub FillRow(ByVal Obj As Object, ByRef FName As SqlString, ByRef FContents As SqlBytes)
        If Not (Obj Is Nothing) Then
            Dim fr As FileRecord
            fr = Obj
            FName = fr.FName
            FContents = New SqlBytes(fr.FContents)
        Else
            FName = SqlString.Null
            FContents = SqlBytes.Null
        End If
    End Sub
End Class


Assuming above code is compiled into an assembly called VBTVF.dll then the following is a T-SQL script to upload the assembly into SQL Server and register the TVF GetFilesInDirectory.



 create assembly VBTVF
 from ‘c:\Projects\TVF\VBTVF\bin\VBTVF.dll’
 with permission_set = external_access
 go



 create function GetFilesInDirectory(@PathName nvarchar(300), @Pattern nvarchar(300))
 returns table(FileName nvarchar(300), FileContents varbinary(max))
 as external name VBTVF.MultiFileAccess.GetFilesInDirectory
 go


Now you can test the TVF with queries like:



 select FileName, datalength(FileContents), FileContents
 from GetFilesInDirectory(‘C:\Projects’, ‘*.*’)


 – Should Return empty result set.
 select FileName, datalength(FileContents), FileContents
 from GetFilesInDirectory(‘C:\NonExistentDirectory’, ‘*.*’)


Enough about TVFs. We will talk about something different next time.


– Balaji Rathakrishnan


Microsoft SQL Server


This posting is provided “AS IS” with no warranties, and confers no rights. 
 


Comments (5)

  1. William Stacey says:

    Thanks Balaji. Just a thought, a Solution with the sql project and maybe a console client project would give this more context and help learners a bit more. You have the Solution anyway. Posting the solution on Channel9 in the Sandbox may be a good place to post the Solution zip. Would give you a broader audience to boot. Thanks again.

  2. William Stacey says:

    I had to convert to c# to understand it better, so here it is:

    using System;

    using System.Data;

    using System.Data.Sql;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    using System.IO;

    using System.Collections;

    using System.Security.Principal;

    namespace SqlServerProject1CSharp

    {

    // The record that holds each row returned by the TVF.

    public class FileRecord

    {

    public FileStream fContents;

    public string fName;

    public FileRecord(string fn, FileStream fc)

    {

    this.fName = fn;

    this.fContents = fc;

    }

    }

    public class MultiFileAccess

    {

    public MultiFileAccess()

    {

    }

    /// <summary>

    /// The fill row method that cracks the FileRecord and returns the individual columns.

    /// </summary>

    /// <param name="Obj"></param>

    /// <param name="FName"></param>

    /// <param name="FContents"></param>

    public static void FillRow(object obj, ref SqlString fName, ref SqlBytes fContents)

    {

    if ( obj != null )

    {

    FileRecord record1 = (FileRecord)obj;

    fName = (SqlString)record1.fName;

    fContents = new SqlBytes(record1.fContents);

    }

    else

    {

    fName = SqlString.Null;

    fContents = SqlBytes.Null;

    }

    }

    /// <summary>

    /// Root method that returns MultiFileLoader which does the bulk of the work.

    /// </summary>

    /// <param name="PathName"></param>

    /// <param name="Pattern"></param>

    /// <returns></returns>

    [SqlFunction(FillRowMethodName = "FillRow", TableDefinition = "FName nvarchar(300), FContents varbinary(max)")]

    public static IEnumerator GetFilesInDirectory(string pathName, string pattern)

    {

    return new MultiFileLoader(pathName, pattern);

    }

    }

    // The core of the TVF’s implementation that implements IEnumerator.

    public class MultiFileLoader : IEnumerator

    {

    private WindowsIdentity callerIdentity;

    private int currentFile;

    private FileStream currentFileStream;

    private string[] fileNames;

    public MultiFileLoader(string pathName, string pattern)

    {

    this.currentFile = -1;

    this.callerIdentity = SqlContext.WindowsIdentity;

    WindowsImpersonationContext context1 = this.callerIdentity.Impersonate();

    try

    {

    this.fileNames = Directory.GetFiles(pathName, pattern);

    }

    catch

    {

    this.fileNames = null;

    }

    finally

    {

    if ( context1 != null )

    {

    context1.Undo();

    }

    }

    }

    /// <summary>

    /// Retrieves the FileStream for the current file as pointed to by the CurrentFile array index.

    /// Constructs a FileRecord with the current file’s name and the contents (as FileStream) and

    /// returns it.

    /// Impersonates caller before opening the file and unimpersonates after getting the filestream.

    /// </summary>

    public object Current

    {

    get

    {

    WindowsImpersonationContext context1 = this.callerIdentity.Impersonate();

    try

    {

    this.currentFileStream = new FileStream(this.fileNames[this.currentFile], FileMode.Open);

    }

    catch

    {

    this.currentFileStream = null;

    }

    finally

    {

    if ( context1 != null )

    {

    context1.Undo();

    }

    }

    return new FileRecord(this.fileNames[this.currentFile], this.currentFileStream);

    }

    }

    /// <summary>

    /// Advances the CurrentFile position in the FileNames array.

    /// Also makes sure to close the currently open FileStream before advancing to the next File.

    /// </summary>

    /// <returns></returns>

    public bool MoveNext()

    {

    if ( this.fileNames == null )

    {

    return false;

    }

    if ( this.currentFile < (this.fileNames.Length – 1) )

    {

    this.currentFile++;

    if ( this.currentFileStream != null )

    {

    this.currentFileStream.Close();

    }

    return true;

    }

    if ( this.currentFileStream != null )

    {

    this.currentFileStream.Close();

    return false;

    }

    return false;

    }

    public virtual void Reset()

    {

    this.currentFile = -1;

    }

    }

    }

  3. Sahil Malik says:

    Why not just use the yield keyword?

  4. san says:

    Hi,

    I wanted to find out how one can get the query (the actual SQL statement) in an TVF that is calling the said TVF?

    Example:

    select * from SOME_TVF()

    SOME_TVF()

    {

    // Something like……

    string query = GetCallingQuery()

    //now query = "select * from SOME_TVF()"

    }

  5. Vivek says:

    Hi,

    Thanks for your code.  But when I try to Deploy this assembly onto SQL Server I get an error

    Error 1 Cannot find the type ‘IEnumerator’, because it does not exist or you do not have permission.