Synchronizing Data between WinFS Stores


Hi, my name is Mark Scurrell and I’m a Program Manager on the WinFS Sync team. I’d like to give you an overview of the functionality we provide to allow applications to synchronize data between WinFS stores.


If you haven’t already done so, I would recommend you first read a previous post where Neil Padgett provided an overview of synchronization and described the scenario where data is synchronized between a WinFS store and a non-WinFS application store.


As well as providing a set of services to synchronize with non-WinFS stores, as Neil described, we also want to allow developers to build peer-to-peer applications. So, as part of the WinFS Sync platform we provide services that will allow applications to synchronize their data between multiple WinFS stores without application developers having to design and code the sophisticated synchronization algorithms that are required for peer-to-peer applications.


A good example of a peer-to-peer application that was built using WinFS Sync is Microsoft Rave, a sample application included in the WinFS Beta 1 SDK. Rave allows WinFS data to be shared between users without the need for a server – each user can directly synchronize folders on their computer with other user’s peer-to-peer. The developer of Rave discussed his experience producing this application in a previous post.


Let’s start with the simplest use of WinFS-to-WinFS synchronization which is to synchronize two WinFS stores.




So, what happens when synchronization is initiated? WinFS Sync will do the following:



  • Determine the changes made to each store since the last sync (change enumeration)
  • Communicate those changes to the other store and update the other store (change application)
  • When applying the changes, determine if there is any conflicting data – the same data has been changed on both stores since the last synchronization (conflict detection)
  • Either log the conflicting data for later resolution or have it resolved immediately (conflict resolution).

WinFS stores will contain many different types of data such as files, custom item types, links between items, and so on. WinFS Sync will synchronize all data stored in WinFS; no custom code is ever required, for example, even if a new item type is defined.


Synchronization between two stores is useful; however more interesting scenarios involve multiple stores. With multiple stores there are different ways the stores can be configured to communicate – the sync topology.




The Microsoft Rave sample application, for example, allows multiple users to synchronize, with each user being able to synchronize with every other user in a full mesh topology. Another application, for example, may require a topology with all changes being communicated through a central “hub”. The point that I want to emphasize here is that WinFS Sync is very flexible and has been designed to cater for any topology; there does not need to be a “master” node and true peer-to-peer applications can be built.


I’ve described how WinFS Sync was built to handle peer-to-peer scenarios and different topologies, but how do we handle the fact that sync applications will need to operate in many different network configurations? Here are some possible scenarios for peer-to-peer sync applications:



  • There are multiple computers on a home workgroup network; a user could synchronize their data between computers so it can be accessed and updated on any computer in the home.
  • Users are members of a domain on a corporate network and can configure folders for sharing and collaborating with other invited users.
  • Family members are located in different parts of the country or in different countries; they can share photos with other members of their family by having them synchronize their computers over the Internet.

WinFS Sync has no knowledge of the network configuration or the transport that will be available for communication of the changes. The application developer must provide a transport over which the sync protocol will operate. We define an interface for the transport and the sync application must supply an implementation of that interface to WinFS Sync. The developers of the transport do not have to concern themselves with the complex logic of synchronization; they implement simple methods such as ReadMessage and WriteMessage and WinFS Sync does the rest.


Those of you who have had a close look at our Beta 1 release may be confused as we provided a facility called Store Synchronizer that synchronized data between WinFS stores and included a transport suitable for a local network. For Beta 2 we have decided to focus on providing the peer-to-peer synchronization platform and will not provide any specific transport implementations. We have therefore modified Store Synchronizer so it requires a transport implementation. In the Beta 2 SDK we will of course provide sample code for a transport, guidance on how to build a transport, and reference material for the Store Synchronizer classes.


My main goal with this article is to raise awareness of the set of synchronization services that WinFS provides to support peer-to-peer sync application development and also to highlight the flexibility we allow in terms of diverse sync scenarios, topologies and network configurations.


I hope you found this post useful. There is further information available in our Beta 1 SDK documentation if you want to dig deeper. I would be interested to hear about any scenarios where you would utilize our WinFS-to-WinFS synchronization capabilities.


Author: Mark Scurrell

Comments (3)

  1. Rajendra says:

    Can the sync’ing be in real time? I would like two folders to mirror the same data, so if one drive crashes I’ll still have my data in the other. This would be useful for My Documents folder. Sort of poor man’s RAID for folders. Not sure how shadow bacups wold fit in or hppen .. i suppose each folder is treated separate? I heard there is an existing non WinFS solution from microsoft already for this. Of course if drave crashes i wouldn’t want the crash part to get sync’d lol.

  2. Aaron Oneal says:

    Greetings,

    Overall, I’m very impressed with the sync capabilities of WinFS and the well thought design. As such, I am interested in working with the synchronization features WinFS affords in a peer-to-peer environement, particularly WinFS-to-WinFS, but I have some concerns regarding scalability and versionability.

    In the SDK documentation, I noticed the section below that indicates changes to backing streams are not versioned in a way that would allow for access to previous revisions or to discern individual change regions. Am I correct that synchronizing any file backed item is going to require that the entire stream be sent to the remote store even though only a small portion of it may have changed? Is there any way to get at the write logs so that only changed regions of the stream need to be synced?

    For example, say I were to use WinFS to store a file backed AVI, WMV, or MP3 file and I alter one of the meta-data properties of the native file format such as Genre, Author, etc. which today results in the alteration of a few bytes of the file stream. I presume that currently OpenBackingStream will throw InconsistentStreamAndDataException until I get to the revision that matches the latest backing file revision and at that point I have to send the entire stream over the wire?

    It seems to me that WinFS is likely logging the write regions during operations to a backing file, and if so, it would be great to be able to get at those through the API. Is this type of incremental file synchronization already in place or being planned?

    Thank you,
    Aaron

    ————–
    Unlike other “WinFS” data, file streams that are changed in “WinFS” do not maintain snapshot history during change enumeration. Therefore, it is possible that the backing stream of a file-backed item is out of sync with the version metadata during change enumeration. However, when System.Storage.Sync.ItemChange.OpenBackingStream is used, this condition can be detected.

  3. Aaron Oneal says:

    With respect to my earlier post, I see that WinFS supports File Transfer Compression which addresses to some degree the concern I had over large file transfers due to small changes. I assume what’s going on under the covers is a kind of block hashing to identify the changed portions of the file as opposed to write log utilization. Does this hashing occur with every replica synchronization or is the information saved behind the scenes for later use when syncing to multiple replicas in a peer-to-peer environment? Are there any plans to support mesh-to-peer synchronization so that portions of large files can be acquired from various peers within the mesh as opposed to always syncing with one remote endpoint which may or may not be the most performant given locale or other factors?