Fast File Enumeration with Partially Initialized StorageFiles

Summary:

  • Enumerating files in a directory is now 10x faster for certain scenarios in RS3
  • Requires only a small change to existing code
  • Backwards compatible with your existing codebase

Intro

Frequently when UWP developers talk to me about quickly accessing files, it comes down to a common pattern: Getting the filename and properties for a large number of files, and then interacting with a few. This pattern spans from media apps listing all the songs available to play and only playing one, to LOB apps showing a list of datafiles or templates before the user picks one to use.

Since you’ve made it clear this is a common pattern for so many apps, we’ve made a few changes in the Fall Creators Updated to help you to write faster, more memory efficient file enumeration code. You can leverage these changes with only a minor modification to your existing app

How much faster? For large data sets it is almost 9x faster. The data below shows how stark the difference is - the top row is in the Creators Update, the bottom row is the Fall Creators Update.

API Used 1017 .jpgs, including ImageProperties 10170 .jpgs, including ImageProperties
Existing (Full StorageFiles) 3 000ms 32 000ms
New(Partial StorageFiles) 450ms 3 700ms

That’s an almost 9x improvement in performance from just a simple change to the code. I was able to make the change in under 10 minutes to my app, and the improvement was instantly noticeable. We’ll cover the code in detail below, but it enumerates all the pictures on a machine. Before we can make the change, a little bit of background on the API surface is needed.

If you’re looking for a quick code sample, check out the MSDN quick start. This post is going to go into a lot of gritty detail to help you understand what is happening under the covers. The quick start covers very similar code but with a focus on getting it in your app right now.

Technical Background:

StorageFiles are the core part of the existing UWP surface and are the common currency for access to a file. Once your app has a StorageFile, it has many different options including:

  • Sharing permissions to the file with another app
  • Reading/Modifying/Deleting/Copying/Moving the file
  • Get a unique identifier for the file
  • Accessing properties of the file
  • Finding out if the file is from a cloud storage provider

These are rich objects used throughout the platform, but this richness comes at a cost. Fetching properties, cloud provider details, and permissions all take CPU cycles and memory. This richness is great for apps providing complex interactions with the file but slows down apps which only need some of these features.

In the Fall Creators Update there is a new flag for creating StorageFile queries that will inform the system to not fully initialize the StorageFile objects it returns to your app. Instead, it will follow a lazy initialization pattern and only create a partial StorageFile object. These partial StorageFiles are much faster to create and only contain basic file properties. However they must be initialized into full StorageFiles to before any of the rich features are used.

What are Partially Initialized StorageFiles

Partially Initialized StorageFiles (or partial StorageFiles for short) are for when apps need the filename and properties from a lot of a files but will only interact with a few of them. Apps first notify the system that they want partial StorageFiles when creating a file query. The system will create partial StorageFiles containing only the data requested in the query. These partial StorageFiles can then have the properties read or be initialized to full StorageFiles as needed. The result is the initial enumeration and property retrieval is extremely fast since partial StorageFiles are much faster to create.

Partial StorageFile objects only contain the some properties and are not fully initialized with all the backing data that you expect to have in a fully initialized StorageFile - such as a connection to the system thumbnail DB or the ability to share the file using an app contract. These connections are created later when requested by the app. This means partial StorageFiles are not going to be useful in all scenarios.

When should I use Partial StorageFiles

Partial StorageFiles are designed for cases where you need the file properties from a lot of different files but don’t need interact deeply with all the files. Here are a few common cases where this might happen:

  • A music app wants to show the titles and artist for all the tracks in an album, but only plays 1 song at a time
  • CNC mills which have many preset templates, but only use one at a time
  • Movie or image gallery apps which show a bunch of media, but only play one at a time

These scenarios, along with many others can be improved by using partial StorageFiles.

However, there are some cases where using this new option doesn’t make sense. Any apps interacting in depth with most or all the files returned from a query should not use partial StorageFiles. I’ll dive into the details why later, but here are a few examples of apps where this technique doesn’t provide a performance boost:

  • Apps that bulk update the properties on collections of files
  • Apps open many files to display a base experience. For example, opening all the files in a directory to display a 3D model

Make the choice that is right for your app and your scenario. Partial StorageFiles are not a replacement for all your existing code using StorageFiles but are there to help in a few common scenarios.

Interacting with Partial Storage Files

Interacting with partial StorageFiles in your code is very similar to using full StorageFiles. The same methods and properties are all there and you can call them normally. There are just a couple differences that you should be aware of.

Accessing the file properties from a partial StorageFile is synchronous as opposed to regular StorageFiles which have asynchronous access. This provides a performance boost versus full StorageFiles.

When an app invokes any operation using a partial StorageFile that requires a full StorageFile, the system will convert the partial StorageFile into a full StorageFile on the app’s behalf. The process of fully initializing a partial StorageFile requires some computation which results in a slight time delay (~1ms).

Fast access to file properties requires the Windows indexer, which places a restriction on partial StorageFiles - they are only available in indexed locations. Fortunately, all user libraries are indexed by default and your app can prompt the user to add new locations to the library at any time.

To help clarify, here is a chart of what you can and can’t do with partial and full StorageFiles.

Full StorageFile (from all existing APIs, and most future ones) Partial StorageFile (from fast indexed properties)
Can access basic properties of a file (file path) Same as full StorageFile
Can access extended properties of the file (album artist) Can access only properties requested at creation
Can copy/move/delete the underlying file Initializes a full StorageFile automatically
Can read or modify the underlying file Initializes a full StorageFile automatically
Can be used to pass permissions to another app Initializes a full StorageFile automatically
Contain a unique ID to use for the file (different than the path or filename in cases where the user is using files with the same file name) Initializes a full StorageFile automatically
Can work outside of indexed locations Cannot be created outside indexed locations

Code Sample and Walkthrough

Let’s build up a simple app to demonstrate how to create partial StorageFiles. We will print out the GPS coordinates of all the pictures in the picture’s library, so the user can see where they’ve been. You can adapt this code into your app or hopefully just use it as a guideline for updating your existing code.

I will skip over the details on getting permission and the basics of StorageFiles. There are tutorials on permissions and libraries, basic enumeration, and other StorageFile topics if you are interested in learning more about some of the choices I’m making in the sample.

Setting Up

 StorageFolder folderToEnumerate = KnownFolders.PicturesLibrary;
//check if the folder is indexed before doing anything
IndexedState folderIndexedState = await folderToEnumerate.GetIndexedStateAsync();
if (folderIndexedState == IndexedState.NotIndexed || folderIndexedState == IndexedState.Unknown)
{
   //Only possible in indexed directories. 
   return;
}

The first step is to grab a StorageFolder for the target location and verify partial StorageFiles are available in that location. Partial StorageFile only work in indexed locations so it is important to check if the folder is indexed. In this sample, we will accept partially indexed locations, because speed is more important than completeness. Your app might have to make a different choice based on your users’ expectations.

Sidebar: IndexedState.PartiallyIndexed only refers to the indexer’s scopes, not to the status of the indexing process. Partially indexed means there could be child folders which are not indexed under a root folder that is indexed. This commonly happens if the user has a reparse point under an indexed folder which points to an unindexed network share. When using partial StorageFiles, partially indexed folders will return results from the indexed locations and ignore the unindexed subfolders.

There may be files in fully indexed locations which aren’t in the index yet because the indexer is behind the file system. This is not reflected in the IndexedState for the folder. The indexing status cannot be provided because the indexer is unable predict how many files it doesn’t know about.

Query Basics

 QueryOptions picturesQuery = new QueryOptions()
{
   FolderDepth = FolderDepth.Deep,
   //Filter out all files that have WIP enabled on them
   ApplicationSearchFilter = "System.Security.EncryptionOwners:[] ",
   IndexerOption = IndexerOption.OnlyUseIndexerAndOptimizeForIndexedProperties
};
picturesQuery.FileTypeFilter.Add(".jpg");

Next, we will setup a query against the pictures library. The most important part of a query for partial StorageFiles is setting IndexerOption.OnlyUseIndexerAndOptimizeForIndexedProperties. This tells the system to return partial StorageFiles to your app. Any other IndexerOption value will return full StorageFiles and not see any performance gains.

Note that we’re filtering out any files with System.Security.EncryptionOwners set. This is a quick way to make sure that your app doesn’t get any WIP protected files back in the results. It isn’t required for using partial StorageFiles but not handling encrypted files makes the sample code simpler.

Property Prefetch

 string[] otherProperties = new string[] { SystemProperties.GPS.LatitudeDecimal,
                                          SystemProperties.GPS.LongitudeDecimal };
picturesQuery.SetPropertyPrefetch(PropertyPrefetchOptions.BasicProperties | 
                                  PropertyPrefetchOptions.ImageProperties,
                                  otherProperties);

In this step, we are declaring the set of properties the partial StorageFiles will contain. The app will need the image properties and two GPS properties. Accessing any of these properties from the partial StorageFiles returned in this query will be fast, but other properties will require initializing a full StorageFile.

In general, you’ll always include PropertyPrefetchOptions.BasicProperties, which has properties such as the file path and file name. Add as many properties as you need at query time. Fetching more properties is much faster during the initial query than fetching extra properties by initializing in a full StorageFile. There is still a slight cost for each additional property at query time, so be sure not to include surplus properties.

Most common file properties are represented in the SystemProperties class, but there are other properties you might need. In advanced cases, the property system reference includes all the details that you’ll need. Partial StorageFiles only work with properties where isColumn = true. For more information about how to read the property system doc please see my video and post on it.

Sort Order

 SortEntry sortOrder = new SortEntry()
{
   AscendingOrder = true,
   PropertyName = "System.FileName"
};
picturesQuery.SortOrder.Add(sortOrder);
if (!folderToEnumerate.AreQueryOptionsSupported(picturesQuery))
{
   log("Querying for a sort order is not supported in this location");
   picturesQuery.SortOrder.Clear();
}

 

The app will also sort the pictures by the file name, so we create a SortEntry for the query. Sorting is a tricky beast because it requires there be an ordering available for the file system language and for the location to be fully indexed. Since it isn’t always possible to sort at query time, it is good practice to check if the sort order is supported before issuing a query. If it isn’t supported, then clear out the SortOrder from the query. Your app can always sort the results later using a LINQ query.

Getting the Results

 //Create the query and get the results
uint index = 0;
const uint stepSize = 100;
StorageFileQueryResult queryResult = folderToEnumerate.CreateFileQueryWithOptions(picturesQuery);
IReadOnlyList images = await queryResult.GetFilesAsync(index, stepSize);
while (images.Count != 0)
{
   foreach (StorageFile file in images)
   {
      //With the OnlyUseIndexerAndOptimizeForIndexedProperties set, this won't
      //be async. It will be run synchronously
      var imageProps = await file.Properties.GetImagePropertiesAsync();

      //Build the UI
      log(String.Format("New: {0} at {1}, {2}",
      file.Path,
      imageProps.Latitude,
      imageProps.Longitude));
   }
   index += stepSize;
   images = await queryResult.GetFilesAsync(index, stepSize);
}

Finally, the app gets the results from the query and outputs them to the user. If you’ve read any of my other articles on fast file enumeration, this code will be familiar. The best practice is to page in manageable sets of files to limit your commit usage in cases where the user has millions of files. You can process each set of files while waiting for the next one to be returned.

The significant difference for partial StorageFiles is GetImagePropertiesAsync. With full StorageFiles this method causes an out of process call as the system fetches the properties from either the indexer or the file. Partial StorageFiles however will short circuit this call and return the results from in your processes memory assuming the properties were specified in the prefetch. This makes the retrieval much faster and reduces the commit used by the system on behalf of your app.

Finally, the app will log the image properties to an output pane. There are no limits to the experiences you can build using users’ files, and I look forward to seeing what you do with partial StorageFiles.

Details on Intializing Full StorageFiles

In the above examples we never initialize full StorageFiles, and this is a common case. Many apps will never need a full StorageFiles - everything they use is in partial StorageFiles. There are cases however where initializing a full StorageFile is required, such as if you are going to access the thumbnail with GetThumbnailAsync.

In these cases, you are going to see the following behaviour:

 StorageFile myFile = … //Get a partial StorageFile from a query

//Partial StorageFile
var name = myFile.Name;

//Partial StorageFile
var imageProperites = await myFile.Properties.GetImagePropertiesAsync();

//Initializes a full StorageFile (~1ms penalty)
var relativeId = myFile.FolderRelativeId;

//Uses existing full full StorageFile (no additional penalty)
var thumbnail = await myFile.GetThumbnailAsync(ThumbnailMode.PicturesView);

There is no difference in the code that you need to write, the system takes care of initializing the full StorageFile for you when it is needed. A full StorageFile will not be initialized in until it is absolutely needed, and the time delay to create it will only happen once per instance. It is important to know what properties are going to cause initialization:

AttributesYes

Property/Method Is it available from a partial StorageFile
Content Type Yes
Date Created Yes
Display Name Yes
Display Type Yes
File Type Yes
FolderRelativeId No, cannot be requested with a query and only available from full StorageFiles
IsAvailable No. Only available from full StorageFiles
Name Yes
Path Yes
Properties If requested Properties requested in the query will be included in the partial StorageFile, otherwise they will require a full StorageFile
All methods No. Any operation on a StorageFile requires a full StorageFile

Note that all operations where you pass a StorageFile into another system function will require a full StorageFile.

Wrap – Up

There are a few critical points to remember when using partial StorageFiles to ensure everything goes smoothly:

  • Partial StorageFiles are designed for reading properties from many files and only interacting with a few files
  • Use IndexerOption.OnlyUseIndexerAndOptimizeForIndexedProperties to fetch partial StorageFiles
  • Only use partial StorageFiles in fully or partially indexed locations
  • Request all the properties your app will need in the prefetch options.
  • Initialize as few full StorageFiles as possible to save memory and CPU

If you follow these guidelines, you can create responsive and engaging experiences for your users using their files.

Sample Function

In case you’re looking for a complete function to try out in your app, here is the full snippet from the article.

 private async void EnumeratePictureLibrary()
{
   StorageFolder folderToEnumerate = KnownFolders.PicturesLibrary;
   //check if the folder is indexed before doing anything
   IndexedState folderIndexedState = await folderToEnumerate.GetIndexedStateAsync();
   if (folderIndexedState == IndexedState.NotIndexed || folderIndexedState == IndexedState.Unknown)
   {
      //Only possible in indexed directories.
      return;
   }

   QueryOptions picturesQuery = new QueryOptions()
   {
      FolderDepth = FolderDepth.Deep,
      //Filter out all files that have WIP enabled on them
      ApplicationSearchFilter = "System.Security.EncryptionOwners:[] ",
      IndexerOption = IndexerOption.OnlyUseIndexerAndOptimizeForIndexedProperties
   };
   picturesQuery.FileTypeFilter.Add(".jpg");
   string[] otherProperties = new string[] { SystemProperties.GPS.LatitudeDecimal,
                                             SystemProperties.GPS.LongitudeDecimal };

   picturesQuery.SetPropertyPrefetch(PropertyPrefetchOptions.BasicProperties | 
                                    PropertyPrefetchOptions.ImageProperties,
                                    otherProperties);
   SortEntry sortOrder = new SortEntry()
   {
      AscendingOrder = true,
      PropertyName = "System.FileName"
   };
   picturesQuery.SortOrder.Add(sortOrder);

   if (!folderToEnumerate.AreQueryOptionsSupported(picturesQuery))
   {
      log("Querying for a sort order is not supported in this location");
      picturesQuery.SortOrder.Clear();
   }
   //Create the query and get the results
   uint index = 0;
   const uint stepSize = 100;
   StorageFileQueryResult queryResult = folderToEnumerate.CreateFileQueryWithOptions(picturesQuery);
   IReadOnlyList images = await queryResult.GetFilesAsync(index, stepSize);
   while (images.Count != 0)
   {
      foreach (StorageFile file in images)
      {
         //With the OnlyUseIndexerAndOptimizeForIndexedProperties set, this won't
         //be async. It will be run synchronously
         var imageProps = await file.Properties.GetImagePropertiesAsync();

         //Build the UI
         log(String.Format("New: {0} at {1}, {2}",
         file.Path,
         imageProps.Latitude,
         imageProps.Longitude));
      }
      index += stepSize;
      images = await queryResult.GetFilesAsync(index, stepSize);
   }
}