Sparse Files: ESE can now give space back to the file system

A long-standing limitation of ESE is that database files never shrink, even after deleting substantial amounts of the data. While the space is freed and is re-usable within the database, the on-disk size does not change. The main limitation is that the first page of each table can never be moved, so even if the file is shrunk from the end, a single unmovable page can prevent shrinking the file. This is similar to an unmovable file preventing the shrinking of a disk partition in Volume Manager. In Windows 8.1, ESE started using Sparse Files to release space back to the file system, even in the middle of database files.

What is a Sparse File?

A sparse file has ‘holes’.

The File System has a special FSCTL called FSCTL_SET_ZERO_DATA. Normally it is used to write a bunch of zeroes to part of a file, but if the file is marked as sparse, and this space gets released to the file system.

But what does the File System do with sparse regions?

Any reads from the file don’t go to disk; instead the FS knows to just return zeroes.

Any writes will need to first allocate a cluster from the file system, and then write to this new block. The new block could be in a completely different part of the disk though -- and the file is now fragmented.

Downsides of Sparse Files

Sparse Files are only supported on NTFS and ReFS; not FAT (and all of the FAT variants).

I do not know of any copy utility (cmd.exe copy, xcopy, robocopy) that preserves the ‘sparseness’ of a file. They all expand the copied file to its full length.

Windows 8.1 and earlier did not do a great job of allocating contiguous space when the sparse regions were re-written. For example, if there was a 10 MB hole in the middle of a file, and the application writes a single 1 MB chunk, the file system often split up that 1 MB in to many 64 kb segments (depending on the fragmented free space of the volume).

Fragmented files hurt sequential access performance, which is very important for fast data access on spinning media (disks). SSDs can also be affected by block size, since it is difficult to know their underlying ‘native’ block size. SSDs may need to do a ‘read-modify-write’ if a program writes to less than the native block size.

Prior to Windows 8.1, memory-mapped files could not be made sparse.

We may get out-of-disk errors in places where we never got them before.

How does ESE use Sparse Files?

ESE manages space hierarchically. We will allocate a large multi-page block of space to be used by a table. We call these blocks of space ‘extents’. The default size of an extent is 16 pages. A table has a main btree, but will also have a btree for each of its secondary indices, and also one for its Long Values (LVs).

An extent starts out being ‘Owned’ by the table, and initially all of the extent is ‘Available’. As the btrees grow and require more space, the pages are allocated from the ‘Available’ part of the extent. Only when there are no more pages in the Available Extent will we ask for another Extent.

When enough records on a page are deleted, the page can be freed, and returned back to the Available Extent.

When an entire extent is available, it can then be freed back to the file system.

Manually Examining Sparse Files

Many utilities don’t know whether a file is Sparse or not. Just doing a ‘dir’ won’t help. But compact.exe can be used. Normally it’s used for examining NTFS-compressed files, but it turns out that the Win32 API GetCompressedFileSize works for sparse files as well.

Let's use an example. Indexed DB is an HTML 5 feature, and Internet Explorer uses ESE for its storage. Here we see that the AppQuota.edb file is 8,454,144 bytes large, and takes up 8,454,144 bytes on disk. It has no space savings whatsoever.

c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB [19:42:30.44] ++$ compact.exe *edb

 Listing c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB\
 New files added to this directory will not be compressed.

  8454144 :  8454144 = 1.0 to 1 AppQuota.edb
151060480 : 139788288 = 1.1 to 1 d Internet.edb

Of 2 files within 1 directories
1 are compressed and 1 are not compressed.
159,514,624 total bytes of data are stored in 148,242,432 bytes.
The compression ratio is 1.1 to 1.

Now we can run esentutl.exe to manually walk the space trees and free up all of the space that it can. Use the -z option (Zero, originally meant to write zeroes to the deleted parts of a database) along with -t (Trim).

c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB [19:43:10.26] $ esentutl -z AppQuota.edb -t

Extensible Storage Engine Utilities for Microsoft(R) Windows(R) Version 6.3
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating SECURE mode...
        Database: AppQuota.edb
                              : AE[ 3]: 47- 256 (210); A 47- 256 (210) 
cpgSparseBefore: 0 cpgSparseAfter: 210 cpgSparseNew: 210

: 0 trimmed (before), 210 trimmed (after), 210 pages newly trimmed.

Warning:
  You MUST delete the logfiles for this database

Note:

  It is recommended that you immediately perform a full backup
  of this database. If you restore a backup made before the
  repair, the database will be rolled back to the state
  it was in at the time of that backup.

Operation completed successfully in 0.313 seconds.

(Ignore the bits about deleting the logfiles – that’s meant for securely zeroing out data, and it’s a reminder that the potentially-sensitive deleted data may still exist in the transaction logfiles.)

Hey look, it even mentions AE -- that’s Available Extents, mentioned above. We just trimmed (freed) 210 pages back to the file system. Compact.exe now says it only takes up 1,572,864 bytes:

c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB [19:43:30.18] $ compact.exe *edb

 Listing c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB\
 New files added to this directory will not be compressed.

  8454144 : 1572864 = 5.4 to 1 d AppQuota.edb
151060480 : 139788288 = 1.1 to 1 d Internet.edb
Of 2 files within 1 directories
2 are compressed and 0 are not compressed.
159,514,624 total bytes of data are stored in 141,361,152 bytes.
The compression ratio is 1.1 to 1.

 

Fsutil.exe is supposed to allow us to dump the allocated bits of a file, but for some reason it doesn’t dump everything. So we’ll try a tool that a friend of mine wrote to dump file fragmentation:

c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB [19:43:49.38] $ fsutil sparse queryrange AppQuota.edb
Allocated range[1]:Offset: 0x0 Length: 0x180000

c:\Users\martinc\AppData\Local\Microsoft\Internet Explorer\Indexed DB [19:47:35.09] $ diskbash -f AppQuota.edb
VCN: 0, LCN: 4270808, LEN: 16 ( 65536 bytes)
VCN: 16, LCN: 19183938, LEN: 368 ( 1507328 bytes)
VCN: 384, LCN: -1, LEN: 1680 ( 6881280 bytes)
============================================================
                  File: AppQuota.edb
             Fragments: 3, Bytes Allocated: 8454144
Fragment Sizes (bytes): Min= 65536, Max= 6881280, Avg= 2818048

The VCN is ‘Virtual Cluster Number’, and it is relative to the beginning of the file.

The LCN is ‘Logical Cluster Number’, and is relative to the beginning of the disk.

So the first block of the file is VCN #0, and there are 16 of them sequentially on the disk, starting at LCN #4270808. It’s a total of 65536 bytes, meaning that this disk has 65535/16=4096
clusters.

The next block of the file starts at VCN #16, for 368 clusters (about 1.5 MB), at LCN #19183938.

The last block of the file starts at VCN #384 (16+368), for 1680 clusters, and at LCN #-1. This means it isn’t actually on the disk!

Also note that this run of 1680 clusters matches the output above of freeing up 210 pages. Each 32k database page takes up eight 4k clusters, and 210*8=1680.

Programmatically Telling ESE to use Sparse Files

Using the regular C API, JET_paramEnableShrinkDatabase must be set to one of the following:

 

 #define JET_bitShrinkDatabaseOff 0x0
 
#define JET_bitShrinkDatabaseOn 0x1 // Uses the file system's Sparse Files feature to release space in the middle of a file.
 
#define JET_bitShrinkDatabaseRealtime 0x2 // Attempts to reclaim space back to the file system after freeing significant amounts of data (when space is marked as Available to the Root space tree).
 

 

Normally JetResizeDatabase (https://msdn.microsoft.com/en-us/library/jj835047(v=exchg.10).aspx) will act the same as JetGrowDatabase (https://msdn.microsoft.com/en-us/library/gg294109(v=EXCHG.10).aspx) -- that is, the database will either stay the same size, or grow. In order to get JetResizeDatabase() to use sparse files, use JET_bitShrinkDatabaseOn. Then pass in 0 for the cpgDesired.

JET_bitShrinkDatabaseRealtime affects the runtime behavior of what happens when enough data is deleted in tables to free an extent. ESE will make that extent sparse, but unfortunately there are some other background tasks which may then cause some data to be written in this sparse area. It's not ideal, but it's reality.

With our C# layer (https://codeplex.com/managedesent), the logic is the same, but the names are slightly different. The setup:

 InstanceParameters instanceParameters = new InstanceParameters(instance);
 instanceParameters.EnableShrinkDatabase = ShrinkDatabaseGrbit.On;
 

 

And the actual code:

  int actualPagesAfterTrim = 0;
 Windows8Api.JetResizeDatabase(sesid, dbid, 0, out actualPagesAfterTrim, ResizeDatabaseGrbit.None);