Getting Data Into WinFS with WinFS Synchronization

One of the first topics developers ask about once they start learning about WinFS is “How do I get existing data in?” We tend to think of this in terms of a bigger problem: “How do I move a lot of data in and out of WinFS?” Our answer here is WinFS Synchronization.

My name is Neil Padgett and I’m a program manager working on the WinFS Synchronization APIs. The goal of the API set is to provide access to all of the services WinFS provides for developers building sync solutions. But, before we delve too deeply into that, let’s talk for a bit about what exactly synchronization is.

The simplest idea that pops into most people’s minds when they want to get data into WinFS is to write an importer. That, is, they plan to just write a simple application to pull data from some application store and then use the WinFS API to create WinFS entities to represent their data. This is a one way importer.

So, this seems great, right? You’ve got your data and it is moved into WinFS. And this works well assuming you aren’t going to use the non-WinFS application to update the data anymore. But what happens if you want to update the data in the non-WinFS application? Let’s make this a bit more specific, let’s assume we have a contact in the application store and we’ve imported it into WinFS. And then let’s assume that we’re going to go ahead and keep using the non-WinFS application to update the contact.

So this works fine? Right? We rerun the importer periodically and update the WinFS contact from the application store.

And, this will work. It works because we never update the data in one of the two places. This means that we can just overwrite the data in WinFS every time. However, WinFS is a shared data store – that contact is available in a well-known schematized format. And the user may choose to make it available to their other applications. So, others may update it. But, if we run our simple importer, we’re going to lose data. How can we solve this?

The answer lies in detecting (and later resolving) this conflict. But further to that, we want to try and merge together changes that happened on the different stores. This means we’ll need to be able to figure out what changed on each store so that we can try and apply those changed to the corresponding item in the other store.

So, let’s consider our contact again. And now, let’s be more specific about what we’re changing.

Let’s assume that we did some initial sync to ensure we had our contact in both stores. (We can talk about what this exactly means later, but for now we can think of it to be like running our importer.)

And let’s imagine that, after we sync we make some updates:

In the application store, we’ll update:
-Home address
-Telephone

And, in WinFS we’ll update:
-Home address
-Cell 

So, now imagine we want to bring our two stores back into sync. So, how do we do that? We know we were in sync before, so we need to figure out what changed on each store, and then apply those changes to the other store. We can call these processes change enumeration and change application, respectively, and we want to do them in both directions. (In fact, WinFS does the hard work of figuring out what changed for us and of making sure that remote changes brought to WinFS are not echoed back to us later.)

Considering our example, we have some changes that are straightforward – the telephone numbers were each changed on the one store, but not on the other. We call these non-conflicting changes. For these non-conflicting changes, we can simply apply them each to the other store. The difficulty comes with the home address – we made changes on both stores – so-called conflicting changes. We’ve detected a conflict and we’ll need to resolve it, either by prompting the user or, more likely, according to some policy (for example, keeping the latest change.) Then, we can bring our two stores back into sync.

WinFS Synchronization is fundamentally about providing services in the store that do just these things: Change Enumeration, Change Application, Conflict Detection, Conflict Resolution either by deferring for manual resolution or via automatic resolution, and other things. We also generalize these services for the case of many stores with arbitrary topologies, and we provide specialized solutions for common cases like synchronizing files or synchronizing several WinFS stores. In upcoming posts I’ll talk about the services WinFS Synchronization provides and how they can be used to solve interesting data moving problems. We’ll also talk about some more interesting scenarios involving multiple synchronized stores with interesting topologies (such as in peer-to-peer) scenarios.

In the comments for this post, I’m interested to hear about how you think you might use WinFS Synchronization and what you’d like me to focus on first in the upcoming posts.

Author: Neil Padgett