The Curious Case Of The Four-byte Writes


Everybody who has worked at Microsoft for long enough has their war stories. I'll share one of my first, from Windows Mobile 6 development.


 


It Sounded Simple Enough


My first full feature at Microsoft was the storage card encryption feature in WM6. My team (Device Management & Security) collaborated with the Windows CE filesystem team to deliver it. The filesys team would deliver the encryption filter itself, and my team would provide the control panel UI, the framework to allow Exchange to manage the encryption settings, and do some end-to-end performance testing.


Windows CE has always shipped an encryption filter as a sample code, so we planned to take the sample, do a little bit of work to make it production ready, and be done. Easy, right? To reduce development time we chose to not implement cache coherency in the filter - every file handle got their own instance of the filter with its own internal buffer. Cache coherency is a complex problem in filesystem design, so we were able to shave a lot of time of the schedule with this choice.


Once the filter is delivered, I ran various user scenarios on encrypted storage cards to make sure that the user experience hadn't degraded too much. We expected some performance hit from the encryption (it's not free), but I needed to ensure that nothing degraded so badly that users would turn off encryption. We contacted several of the feature teams, and they went off and did their own testing on encrypted storage cards for the same reason.


It turned out that one particular app had a ridiculous performance degradation. Opening a file that previously took a few seconds now took several minutes. Obviously, this was unacceptable. I went to investigate.


Digging In


The encryption filter decrypts each block of the file as you read it. So if you read a block, the filter needs to read the first block of the file to get the decryption keys. Then it reads the block that you requested, decrypts it, and returns the part that you asked for. So we always read one more block than you requested. (the key block) I thought this might be part of the performance problem, but I needed to do some measurements to be sure.


I took the FSDSPY sample filter driver and modified it to log a little bit more information. With this I was able to track every read and write to the disk. I ran some sample files through the app and then wrote a little Powershell script to parse the log output. The results were eye-opening:


 



Unique physical pages read (one read operation): 103
Minimum pages read (page-size reads): 202
Your pages read (optimistic calculation): 98792


Total Bytes Read Total Reads Average Read Size Largest Read
---------------- ----------- ----------------- ------------
411945           49396       8                 6428


The application was reading about 100 pages off of the disk. If they read it one page at a time, which is pretty common, they'd be reading about 200 pages. But they were reading almost 100,000 pages. 25,000 of their 50,000 reads were reads for only 4 bytes of data. Because the filter reads a minimum of two pages per read (as described above), this app was reading 8192 bytes from the disk and doing a decryption operation for every 4 bytes of data that it wanted to consume. So that explained the thousand-fold performance degradation. Diagnosing the problem was pretty easy. Solving it was harder.


Pointing Fingers


We blamed the application team. How could they think that reading in their data 4 bytes at a time was acceptable? It's ridiculous. They should fix their read routines.


The application team blamed us. We had changed the rules of the world. Their file load operation used to be fast, and now it was slow. It was fast because the disk driver itself has a little cache. The encryption filter doesn't have a cache (because we would have had to make it coherent), so by inserting the filter we changed the behavior of the world. Their file code was several years old and in maintenance mode.


We didn't have schedule time to go back and rewrite the filter to add this functionality. It would have essentially caused a rewrite of the filter, because the cache logic is almost as complex as the encryption filter itself. The app team didn't have time reserved to rewrite all of their file I\O routines, and they certainly didn't want to cut any other work to do so because it doesn't buy their users any perceived value. We couldn't ship it as is, because using the files for this app is absolutely the kind of thing that we'd expect our customers to put on an encrypted storage card. It would cripple the value of our feature if you couldn't use it with this application.


A Happy Ending


Luckily, we didn't have to take either of the above choices. The filesystem team was working on a global cache filter for Windows CE6. This cache filter sits at the top of the system (above our filter) so it would buffer all of the app's filesystem accesses. The filesystem team hadn't yet finished this feature for CE6, so it was a bit of a risk to depend on it, but that was the path we chose. Once they got it finished and tested for CE6, they ported it back to Windows Mobile 6. With the cache filter installed, the performance tests went back to green - the performance hit from encryption with the cache filter in place was neglible. So in the end, everyone got what they wanted and we were all able to ship our features.


 


Incidentally, this incident inspired a blog entry from my colleagues on the filesystem team.


 


Scott

Comments (6)
  1. Mark Moeller says:

    Application writers doing 4 byte reads need to be handed over to the proper authorities and given a lesson on how much time the CPU is wasting just doing an API call vs. actually doing the work of reading data they are requesting.  😉

  2. Chris Ashton says:

    I’m with the app team on this one.  They were meeting their performance requirements before their change.  And if the OS or driver is caching pages for them, it’s redundant to write their own buffering.  It’s just extra code and another place for bugs to hide out.

    Sure, we’ve all grown up on old bondage-and-discpline OSes that didn’t do buffering under the hood.  Somehow that’s taught us that this is the way things ought to be.  The OS should never try to be helpful; it’s the app writer’s job to constantly reinvent this wheel (or scrounge up a wheel from somewhere else), and above all else, thou shalt never write four bytes at a time.  I respectfully disagree.  As it turns out, this wasn’t an unsurmountable problem; it could be solved once by the OS, rather than a zillion times for every ISV.

  3. Mikado says:

    I’m glad you got this happy resque at the end, but I don’t like your approach to development at all! You were just lucky. If you always sacrifice features, bugfixes and performance for "development time", your products will obviously be far from ideal. Shedule, ha! you’re just making excuses :-

  4. scyost says:

    It’s just a matter of people, time, and money. I could have spent another month working on this and we could have kept the filesys devs another month working on it. The next thing I worked on was the certificate improvements. There were a ton of people impacted by those problems and fixes. I wouldn’t want to sacrifice the work I did there.

    We also could have implemented a cache in the encryption filter but it would just be duplication of other caching code in the system. This leads to bugs, wasted time, and "bloat" in both features and ROM size.

    It would have been ideal if we had noticed the behavior of this application during the design phase of the encryption filter. Then we could have had a discussion at that time about what to do. I think getting the global cache filter in place was the right solution all along, but it would have been nice to know we needed it earlier in the cycle.

  5. Chris Bevan says:

    Hi,

    We are developing a fingerprint recognition ystem fr a customer for a new Windows Mobile 6.0 device.

    They are intrested in using the storage card encryption feature in WM6. They have asked if it is possible to hook into the storage card encryption feature so that you could not allow encryption or decryption to start until a valid fingerprint has been obtained.

    We will be involved with the platform build at the OEM level. Is there any way of hooking into the storage card encryption feature so that you can hold if off while the user scans a finger?

    Alternatively, I guess we could write our own file filter below yours that would prevent access to the file system until the figer was validated/. Woul dthis work?

    Many thanks

    Chris Bevan

    Intrinsyc.

  6. Pariksheet says:

    Hello Sir,

    I also want to develop filter to to keep track of all write operations on disk.

    I used that FSDSPY sample and build the dll.

    But I was unable to load my filter dll in the device.

    How can I load the my dll?

    Is it possible to load it on emulator?

    Or which device I’ll required whether windows mobile 5 or 6?

    If you have any idea about it then Please help me.

    Awaiting for your reply.

    Thanks & Regards,

    Pariksheet.

Comments are closed.

Skip to main content