Blobfuse is an open source project developed to provide a virtual filesystem backed by the Azure Blob storage.

Article
02/20/2019

Blobfuse uses the libfuse open source library to communicate with the Linux FUSE kernel module, and implements the filesystem operations using the Azure Storage Blob REST APIs.

Features

Mount a Blob storage container on Linux
Basic file system operations such as mkdir, opendir, readdir, rmdir, open, read, create, write, close, unlink, truncate, stat, rename
Local cache to improve subsequent access times
Parallel download and upload features for fast access to large blobs
Allows multiple nodes to mount the same container for read-only scenarios.

Installation

You can install blobfuse from the Linux Software Repository for Microsoft products. The process is explained in the blobfuse installation page. Alternatively, you can clone this repository, install the dependencies (fuse, libcurl, gcrypt and GnuTLS) and build from source code. See details in the wiki and the GitHub Repo.

Blobfuse and Data Science Virtual Machine

Blobfuse is already installed on the Ubuntu DSVM. To use it, create a configuration file /opt/blobfuse.cfg as described https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux
or https://github.com/Azure/azure-storage-fuse/tree/43e82df5d85a4c082dc67af8131bcf05f4d9270a

Usage

Mounting

Once you have installed blobfuse, configure your account credentials either in the template provided in blobfuse folder (connection.cfg), or in the environment variables. For brevity, let's use the environment variables:

 export AZURE_STORAGE_ACCOUNT=myaccountname
export AZURE_STORAGE_ACCESS_KEY=myaccountkey

Then mount your blob storage on the VM:

Use of a high performance disk, or ramdisk for the local cache is recommended. In Azure VMs, this is the ephemeral disk which is mounted on /mnt in Ubuntu, and /mnt/resource in RHEL. Please make sure that your user has write access to this location. If not, create and chown to your user.

 sudo mkdir /images
sudo mkdir /mnt/blobfusecache

 chown -R <your-user-account> /images
chown -R <your-user-account> /mnt/blobfusecache/

Create your mountpoint (mkdir /path/to/mount) and mount a Blob container (must already exist) with blobfuse:

 blobfuse /images --tmp-path=/mnt/blobfusecache -o big_writes -o max_read=131072 -o max_write=131072 -o attr_timeout=240 -o fsname=blobfuse -o entry_timeout=240 -o negative_timeout=120 --config-file=/opt/blobfuse.cfg

NOTE Use absolute paths for directory paths in the command. Relative, and shortcut paths (~/) do not work. Blobfuse does not support multiple writers to a single blob, so you will need to guarantee that the file names generated during the extraction part are unique.

For more information, see the wiki

Interested in Data Engineering

Check out the Data Engineering learning resources at Microsoft learn