How to Programmatically Improve File System Throughput


Writing large amounts of data to persistent media can take large amounts of time. While Windows CE does have hardware limitations in comparison to the desktop, there are coding practices that will increase your throughput when writing to disk/flash.  Higher throughput results in a better overall experience with WindowsCE and applications for WindowsCE.  I will use flash media as my primary example in this blog.


·        Know your block size.

The most important step to programmatically increase your throughput is to know your flash block size, and to write in multiples of the block size.  The flash block size is the smallest buffer that will be written to, or read from, flash.  Writing less than 512 bytes at a time will take the same amount of time as writing a full 512 bytes.  If a program is writing 4 bytes at a time in random places in a file, then the throughput could be 128 times worse than another program that is buffering and writing out 512 bytes.  There is a small amount of caching in most file systems, as well as block drivers, which may save an application with 4-byte-write behavior from coming to a near-standstill, but this is not guaranteed.


Buffering to a multiple of the block size will further increase your throughput.  If you make one large write of ten times the block size, then your program only needs to verify buffers, traverse the file system stack, and thunk into the kernel one time to flush all the data.  The time taken for each step in the process of getting data written to disk is small; however, every millisecond starts to noticeably add to overall time when there are hundreds or thousands of writes. 


Another reason to write multiples of the block size is increasing the throughput for compression and/or encryption.  Compression and encryption are usually CPU-heavy, block-based algorithms that require a set amount of bytes(usually a block multiple) to be read for every single read or write.  Every time a byte is written, the entire block associated with that byte must be read, the block must be decrypted, the block must be decompressed, the byte changed, the block compressed, the block encrypted, and finally the block will be written out. 


The 4-byte-write application will likely see their throughput slow to a crawl in this situation, as every one of the CPU-heavy algorithms will be called 128 times for writes amounting to 4kB.  This will also wear the flash media down considerably faster.  In contrast, a program that is writing 16kB buffers on a media with a 4kB block size will read in data once, and decrypt/decompress/compress/encrypt will be called a total of 4 times for the entire operation. 


In order to discover the block size of your current flash media you will need to call CeGetVolumeInfo:



  LPCWSTR pszRootPath,




(From: )


The first variable, pszRootPath, can be any valid path name and the returned information will reflect the file system which manages that path.


The structure returned (LPCE_VOLUME_INFO) is a pointer to the following structure:

typedef struct _CE_VOLUME_INFO{
  DWORD cbSize;
  DWORD dwAttributes;
  DWORD dwFlags;
  DWORD dwBlockSize;

(From: )


dwBlockSize is the block size of your flash in bytes. 


·        Memory Mapped files for caching

If your application does not need to flush information immediately, or can survive power loss with some data loss, then another option for increased throughput is memory mapping the file that you’re writing to.  The memory-mapped file will act as a file cache, until you write it out. When the file is written out, you get the benefit of the file being written in large chunks, in a contiguous manner; although, the contiguousness is not as important for flash.  The memory mapping functions are located here:


·        Use sub-directories (FATFS)

FATFS is the primary file system used on flash media, so keeping the number of files in a particular directory will help most devices.  The reason the number of files in a directory is important rises from the fact that directory entries are stored as a linear array, which must be walked on every path-based API. When you have 15,000 files in one directory, every open/close/move/delete/findfirst file call requires all the files in the directory to be enumerated, which means the entire array is walked until your file is found.  Needless to say, this is very slow.  Saving all your project files in one directory may seem like an easy-access solution for your program’s enumeration, but it is very difficult for the file system to keep up.  You can hash the filenames, or bucket them in alphabetical order; your ordering scheme doesn’t matter as long as the buckets are as small as the array you want walked on every path-based file operation. If your program requires all the files be stored in one directory, perhaps you can write a file system filter that will hide the subdirectories for you.


·        Expensive APIS: FlushFileBuffers(), RegFlushKey(), FlushViewOfFile()

WindowsCE is used in environments where power loss is much more likely than the desktop.  A phone is dropped, a battery dies, etc.  The architects, developers, testers; everyone is aware of these scenarios and the shell, file system manager, and OS are written with power loss in mind.  There is a lot of flushing already being done on your behalf, which makes these functions unnecessary time-wasters for most applications.  Sue’s post: also adds perspective to whether application developers should be flushing. 


FlushFileBuffers(): This function will flush everything in the write-back cache, as it does not know what part of the cache belongs to your file.  This can take a lot of time, depending on the cache size and the speed of the media.  How necessary is it?  There is a thread which goes through and writes out dirty pages, so it is likely not very necessary. 


RegFlushKey():  As with FlushFileBuffers(), this API flushes a lot more than you would think: depending on the situation, RegFlushKey could end up flushing a lot of data, up to the entire registry.  Double and triple checking that your changed registry key has successfully been stored by using RegFlushKey() is usually not necessary.  Ask yourself: “Will the machine still be able to boot if this registry key isn’t immediately saved?”  If the answer is yes, then you probably do not need to use this API in the middle of your process.  As with files, there is a thread that will wake up and flush the registry on your behalf.  If you must flush your registry key, try to do it at the end of your program’s execution, preferably at close.  This way you get all the changes that you have made (and anyone else has made) written out. 


FlushViewOfFile():  This API is useful when you are writing to a memory mapped file, and using it as a buffering scheme.  What is not useful is flushing the view on every 4 byte write.  If you’re flushing for every write you make, then what is the purpose of using memory mapped file?  Use the map as a buffering tool to increase the amount of data changed, rather than a safety net guaranteeing that everything will be written out precisely.  Call this only when you’re done writing or at preset, block-multiple intervals.  Again, there is a thread that intermittently will write dirty pages to the file, so it may not be necessary at all.

Comments (9)

  1. I’ve received a few questions from users who have said that their WM5 device seems to slow down and speed…

  2. Ariane in our file system team wrote a great blog entry that describes what an application developer…

  3. Andreas Selle says:


    I’ve checked with many flash memory cards, and all of them had a 512 byte block size. Does anybody know of a flash memory card that uses a different block size?

    Also, the cluster size of the FAT system usually is bigger than 512 bytes. Wouldn’t the performance be better if all file access is aligned to the FAT cluster size instead of the flash memory block size?

  4. ce_base says:

    Hi Andreas,

    Sorry for the delayed response.  I, personally, do not know of any flash cards that have a block size larger than 512bytes, but I did not want to assume a standard size.  

    You are right that the performance would be better if the file access is aligned to the FAT cluster size.  The FAT cluster size is a power of 2, which means that the cluster size is a multiple of 512 bytes.  It was mentioned in the blog that a multiple of the block size is better for performance.  Writing in multiples of the FAT cluster size is not a bad idea either.  

    There are lots of numbers that can be used as baselines for buffering: page size in memory, cluster size in FAT, block size in flash.  This blog will (I hope) help people understand where their performance bottlenecks are where the file system is concerned, and help them design new applications with file system performance in the back of their minds.

    The reason the FAT cluster size was not mentioned is that I did not want to pigeonhole the blog into being about file access in FAT on flash memory, as what was mentioned (outside of the directory tip) will apply to other file systems as well.


  5. Nino.Mobile says:


    Software / Hardware

    Sling Media has released SlingPlayer for Windows Mobile (via …

  6. Tweakradje says:

    After examining the fatfsd.dll from wm2003se I came to the following registry entries that provided my device with a major performance boost:



    "CacheSize"=dword:00002000 (at the cost of 2 Mb RAM)

    "DLL"="fatfsd.dll" (use strings.exe from sysinternals to see these params)




    "Flags"=dword:00001006 (writethrough bla bla)

    "MaxCachedFileSize"=dword:00020000 (don’t cache >128Kb files)

    "Paging"=dword:00000001 (ALWAYS 1!!)


    "UpdateAcces"=dword:00000000 (I don’t need access time stamps)

    [HKLMSystemStorageManagerProfilesFlashDrvFATFS] (could differ)

    "CacheSize"=dword:00000200 (at the cost of 256 Kb RAM)


    Yes, it is at the cost of some RAM. Why is/was this not implemented by default in wm2003se? How does WM2005 deals with caching of flash memory data devices?


  7. Everybody who has worked at Microsoft for long enough has their war stories. I’ll share one of my first,

  8. Everybody who has worked at Microsoft for long enough has their war stories. I’ll share one of my first,

  9. Everybody who has worked at Microsoft for long enough has their war stories. I’ll share one of my first,