Getting Data Into WinFS with WinFS Synchronization


One of the first topics developers ask about once they start learning about WinFS is “How do I get existing data in?” We tend to think of this in terms of a bigger problem: “How do I move a lot of data in and out of WinFS?” Our answer here is WinFS Synchronization.


My name is Neil Padgett and I’m a program manager working on the WinFS Synchronization APIs. The goal of the API set is to provide access to all of the services WinFS provides for developers building sync solutions. But, before we delve too deeply into that, let’s talk for a bit about what exactly synchronization is.


The simplest idea that pops into most people’s minds when they want to get data into WinFS is to write an importer. That, is, they plan to just write a simple application to pull data from some application store and then use the WinFS API to create WinFS entities to represent their data. This is a one way importer.


So, this seems great, right? You’ve got your data and it is moved into WinFS. And this works well assuming you aren’t going to use the non-WinFS application to update the data anymore. But what happens if you want to update the data in the non-WinFS application? Let’s make this a bit more specific, let’s assume we have a contact in the application store and we’ve imported it into WinFS. And then let’s assume that we’re going to go ahead and keep using the non-WinFS application to update the contact.


So this works fine? Right? We rerun the importer periodically and update the WinFS contact from the application store.


And, this will work. It works because we never update the data in one of the two places. This means that we can just overwrite the data in WinFS every time. However, WinFS is a shared data store – that contact is available in a well-known schematized format. And the user may choose to make it available to their other applications. So, others may update it. But, if we run our simple importer, we’re going to lose data. How can we solve this?


The answer lies in detecting (and later resolving) this conflict. But further to that, we want to try and merge together changes that happened on the different stores. This means we’ll need to be able to figure out what changed on each store so that we can try and apply those changed to the corresponding item in the other store.


So, let’s consider our contact again. And now, let’s be more specific about what we’re changing.


Let’s assume that we did some initial sync to ensure we had our contact in both stores. (We can talk about what this exactly means later, but for now we can think of it to be like running our importer.)


And let’s imagine that, after we sync we make some updates:


In the application store, we’ll update:
-Home address
-Telephone

And, in WinFS we’ll update:
-Home address
-Cell 


So, now imagine we want to bring our two stores back into sync. So, how do we do that? We know we were in sync before, so we need to figure out what changed on each store, and then apply those changes to the other store. We can call these processes change enumeration and change application, respectively, and we want to do them in both directions. (In fact, WinFS does the hard work of figuring out what changed for us and of making sure that remote changes brought to WinFS are not echoed back to us later.)


Considering our example, we have some changes that are straightforward – the telephone numbers were each changed on the one store, but not on the other. We call these non-conflicting changes. For these non-conflicting changes, we can simply apply them each to the other store. The difficulty comes with the home address – we made changes on both stores – so-called conflicting changes. We’ve detected a conflict and we’ll need to resolve it, either by prompting the user or, more likely, according to some policy (for example, keeping the latest change.) Then, we can bring our two stores back into sync.


WinFS Synchronization is fundamentally about providing services in the store that do just these things: Change Enumeration, Change Application, Conflict Detection, Conflict Resolution either by deferring for manual resolution or via automatic resolution, and other things. We also generalize these services for the case of many stores with arbitrary topologies, and we provide specialized solutions for common cases like synchronizing files or synchronizing several WinFS stores. In upcoming posts I’ll talk about the services WinFS Synchronization provides and how they can be used to solve interesting data moving problems. We’ll also talk about some more interesting scenarios involving multiple synchronized stores with interesting topologies (such as in peer-to-peer) scenarios.


In the comments for this post, I’m interested to hear about how you think you might use WinFS Synchronization and what you’d like me to focus on first in the upcoming posts.


Author: Neil Padgett

Comments (8)

  1. TG2 says:

    Neil, great article. I’d like to know more about how WinFS works over a marginal connection (one that’s not always present or drops in the middle of the conversation). It’s these boundry conditions that make my job interesting and I’m hoping that WinFS will help me solve some of them 😉

    -tg2

  2. Neil,

    Great article. But there’s one little insidious scenario that comes to mind which you didn’t touch upon – though I hope your team has addressed it!

    Applications need a mechanism to mark data as up-to-date, without actually having to change it.

    For example, say I’m out of the office for a week, and my assistant back at home base makes a change to the "category" field for a contact "Bill".

    The next day I find myself with an hour to kill in the airport before my flight home. I decide to go through all the contacts on my smart phone to make sure the information is filed correctly. I decide the category for Bill should stay the same, and DON’T make any changes to his record.

    When I get back to the office, the data syncs. A few days later I realize Bill is in the wrong category, and I’m a little upset that even though I reviewed the contact AFTER my assistant made the change, her change still overrode my review.

    The idea is that sometimes the act of simply presenting the data to the user should count as an "update" transaction. Another example would be when an agent at a call center asks you to confirm your name/address details. Nothing is changed, but the old data should now be marked "newer" (and treated as updated – as of whatever time it was reviewed – during the WinFS synchronization process).

  3. Arian Kulp says:

    I’m curious about how WinFS will fit into the Windows Mobile universe. Storing data is no fun on any platform, but it’s worse with mobile apps since it forces you to either silo the data on the device, write custom network sync mechanisms, or write custom IntelliSync mechanisms.

    I’d love to see WinFS directly available for Mobile, with the ability to scope what data is propagated in both directions. Obvious types would be mail, contacts, tasks, and events. Even better though, if I create a killer app for keeping track of a movie collection, for instance, I should be confident that those entries will flow both ways. Is this already in the works? I hope it’s retrofitted for PPC 2003!

    Thanks!

  4. Neil Padgett says:

    Good questions everyone.

    tg2, regarding marginal connections, WinFS Sync supports cancellation and resumption — this support should make dealing with dropped connections relatively straightforward.

    Richard, your scenario is supported — I’ll talk a little bit about how WinFS tracks changes and how WinFS Synchronization determines changes in a future post.

    Thanks for reading.

  5. It sounds like a version control tools for all type of documents.

  6. Neil,

    Thanks for your response.  Looking forward to that new post.

  7. Richard Kagerer says:

    Neil,

    Another quick question; it’s about data ownership, from the perspective of both  applications and users.

    I’ve noticed that applications, in general, do a rather poor job of uninstalling all traces of themselves.  Leftover tidbits can be simple, like settings left in the registry.  But often they’re more complex, especially when they tap into existing data stores – e.g. an app that creates custom fields in an Exchange data store rarely has a mechanism to uninstall those fields, and unwanted data may pollute the store for years to come after the program using it is history.

    With WinFS, you guys are creating a big soup and mixing in ingredients (data) from all sorts of different vendors.  My question is what happens once I want to remove one of those apps?  I may want to keep some of the data it created (e.g. documents) but strip out any metadata that it’s proliferated through my store.

    One advantage of siloing is that an app’s data tends to be fairly encapsulated (e.g. in the days of MS-DOS all you had to do was delete a directory) and it was easy to keep things clean.

    1) Does WinFS include some mechanism to tie data/metadata to one or more applications, so that it will be removed when the application is uninstalled?  (Or even a way to make the data travel with the application?)

    2) Will there be tools provided to developers that make it incredibly easy to define which data should be treated which way? (So they don’t just leave it at the default "leave these bits here forever", like often happens today with registry settings and installer log files)

    Thanks,

    -Richard

    p.s. Arian: Great comments.