I’ve received a few questions from users who have said that their WM5 device seems to slow down and speed up erratically. (To be clear, we’ve only heard reports of this happening on two upgrade devices, and not on any of the devices designed specifically for WM5.) Some have dug in and found that “filesys.exe” is using a lot of CPU power and that it appears to be running something called a “Compaction Thread.” There have also been questions about a registry key called CompactionPrio256. I can’t explain some of the bigger problems people are describing, but I can at least tell you what these things are and what they’re trying to do.
This entry assumes you know the difference between RAM and ROM and that you know what I mean when I say “NOR” and “NAND.” If not, please read this first. I’m also assuming you know what Persistent Storage is. If not, please read this.
It’s easier to move on, than to forget
Flash ROM is divided up into “blocks.” Each block is then subdivided into “sectors.” Sectors tend to be 512 bytes each. Blocks vary in size, but are often around 128K. The most difficult thing to do with flash is to get it to forget what it knows. In most storage systems (hard drives, RAM, pencil and paper, etc) you can erase as much or as little as you want. That’s not the case with flash ROM. If you want to erase one bit of information in flash, you need to erase the entire block that it’s in. On NOR flash, erasing one block can take up to two seconds (NAND is much faster at erasing).
As if the long erase time weren’t motivation enough to not do it very often, there’s an added problem that a flash block can only be erased so many times before it wears out and stops working.
Here’s how we deal with these problems. Whenever we need to change something that’s in a sector, we copy the sector into RAM, change whatever needs to be changed, and then write it back to a new spot in flash. Then we mark the old sector as invalid.
This does a few things. First, it makes most writes much faster (we don’t need to wait two seconds for the block to erase before we write the new data). Second it spreads the data around, which results in spreading the erases around. Rather than erasing one block ten times, you erase ten blocks one time each. (This is called “wear leveling.”)
So, life is peachy. We just mark sectors as invalid and write the data to unused sectors. Unfortunately, this largesse of spending eventually catches up with us. At some point we’ll run out of unused sectors. Now what? If we were a government, we’d just print more sectors and keep running. Alas, we’re not. “Deficit sectoring” isn’t allowed here.
What we do is have some code that keeps an eye on the state of the flash and the state of the system. If the flash starts getting full of invalid sectors, and if the system seems to be idle, this code will kick in and start “compacting” the sectors. If there are any blocks in which all of the sectors are invalid, it’ll erase them. If there are blocks that have a small number of valid sectors and a bunch of invalid sectors, it’ll move the valid sectors elsewhere, and then erase the block.
We call the code that does this the “Compaction Thread.” The compaction thread lives in filesys.exe, which is one of the primary components of Windows CE (it’s the file system).
The goal of the compaction thread is to keep free blocks around at all times so that all writes work, but to do it in a way that you, the user, don’t notice. So, it runs only when the system is on but idle. Unless…
What happens if we do a ton of writing to flash and fill up all the flash blocks, but the system is never idle enough for the compaction thread to run? Eventually we’ll go to write something, discover that there are no free blocks, and declare a state of emergency. The file system will then impose martial law, take over the CPU, and run the compaction thread long enough to erase a block and free up a sector to be written to. We call this “Critical Compaction.”
Due to the way the hardware works, flash memory can not be read from or written to while a block is being erased. So, regardless of how we got into the compaction thread, the system isn’t going to be very responsive while we’re erasing a block. (And, remember, NOR flash can take up to 2 seconds to erase a block.)
What can I do about it?
Compaction is a result of data being written. Reading data doesn’t cause compaction. If you’ve got an application that is writing data frequently, especially if it’s writing it in small chunks, that app is going to make you compact frequently. (Here’s a great blog entry that explains why writing small amounts is bad.) Also, the less free space the flash has, the harder the compaction thread is going to need to work to find blocks to compact. So if you’ve got your storage space maxed out, you’re going to hit compaction more often than if you have spare space.
Why does this happen on WM5 but not WM2003? Compaction is only needed when data is stored in flash ROM. WM2003 PocketPCs stored their data in RAM, which is much faster and doesn’t need to be compacted, but has the disadvantage of being erased if you lose power. (Note that all Smartphones since the original 2002 release have had persistent storage and have done compaction.)
Why are Axim users seeing this so much more than everyone else? That I don’t know. The compaction thread is the same on the Axim as on every other WM5 device. It would seem that the Axim is doing something different than other devices. I don’t have visibility into their design and can only speculate.
Why would a NOR user see this more often than a NAND user? NOR takes much longer to erase a block than NAND does. It’s possible that both are compacting, but the NAND device got done quickly enough that you don’t notice.
Why would this possibly run for longer than 2 seconds? I’m at a loss there. Maybe something is continuing to write data and is forcing more compaction?
What about these registry keys?
Someone told me that changing the registry keys “CompactionPrio256” and “CompactionCritPrio256” fixes the problem. Unfortunately, that can’t be correct. There are two things you need to know. First, in Windows CE, smaller numbers are higher priority. Compaction defaults to 255, which is the lowest possible priority. Making the number lower would increase the priority. That’s the opposite of what you think you want to do.
More importantly, though, changing those registry keys doesn’t do anything. It is the file system that loads and creates the registry. But the file system needs some information before the registry is loaded. So there is a very small bit of data (a “boot strap” registry) that is loaded before the user changeable registry is loaded. The real versions of these values are in that pre-registry. The ones you see are kind of like reflections of those real values. You can change them to your heart’s content, but you’re only changing the reflection. You’re not changing the actual values the file system uses. There’s no end user way to change the real values.
<braces for impact>
Hopefully I answered some of your questions. I know that I didn’t answer the question you really care about though. “When are you going to fix this?” I’m expecting a lot of angry comments as a result. I’ll apologize up front that I don’t have an answer to that one.