A teaser on how OneNote storage and replication works

The other day someone internally was asking how OneNote stored its files and how often the save behaviour actually happened. You know if you were to pull the power cord on your computer what would you lose and what wouldn't you lose? Well Irina Yatsenko from the OneNote Test team wrote up the following to answer the question and she wanted me to post it for all to see:

Now, I'll describe in more details what we do in OneNote 2007:

  1. Internally all data from a single paragraph on a page up to a notebook are represented in a graph, which is split in areas we call "graph spaces". This allows us to load/save incrementally per a graph space, so when you open a notebook, you'd see all section tabs popping up almost immediately though pages inside those sections aren't yet loaded. When saving we can also choose which piece to save, rather than saving everything.
  2. We never save directly to the server hosting the files (even if it's a local machine). First we save into local cache file. Because the cache is local and OneNote has exclusive access to it, we can guarantee that save always succeeds (if not, OneNote will force an exit, because running without a cache means users might lose data, and we think it's better to exit then lose data). Save into cache happens every 30 sec or on exit ([descapa] I have found this to be faster at times though I am not pulling my power cord out)
  3. To propagate the data from the cache back to the original location of the sections we use background process – replication (=sync). Schedule for the sync depends on the actual store: UNC servers / local machine replicate every 30 sec, but for SharePoint it's by default set to 10 min. If replication fails (e.g. because the machine has lost power) the cache will still have the data and will try to replicate again after OneNote is restarted.
  4. Actual mechanics of the incremental save are rather technical. The bottom line is that we have our own binary format and all changes are stored in form of "revisions", sort of diff between current state and previously saved state. As these revisions grow OneNote will run optimization to clean up the revisions and update the main base state.

 

Hope it clears things a bit, let me know if you have any questions.

Thanks Irina! So I hope this explains things like why we have a cache (which allows OneNote to go offline, merge changes and more) as well as explain why our app works certain ways. The storage tech is actually quite complex and innovative; I haven't really appreciated it as much until I deal with other sync technologies that make me choose which copy is the most up-to-date, etc. There is still a lot more going on under the covers but this is a good overview, if you have more questions please let us know.