As developers start architecting their apps with Sync Services for ADO.NET they're starting to ask questions about the holes we didn't get complete in this 1.0 release. One such topic is how can I send changes from the server to the client, without the client having to ask? There are a few problems that quickly surface:
- Addressability of each client - where do I send the notification if my client address keeps changing
- Maximize the scalability of central resources by minimizing the detailed knowledge of each client
- Minimize network traffic
Here's the problem, the client doesn't really know when something has changed on the server so it's left to "ping" the server occasionally asking "has anything changed in these objects?". This sort of sounds like the electronic version of "are we there yet?" With Sync Services, we very specifically designed the system so the server doesn't actually know who the clients are. Clients may come and go over time, and requiring the server to track changes for each client imposes scalability limitations. This is one of the challenges with Merge Replication. It's not that we didn't want to solve this problem, but we wanted to layer the problem as it's not unique to Sync Services. This leaves us with a gap. So, what to do?
Keep It Silly Simple. The simplistic option is to just have the client occasionally ask if anything has changed. Consider an application that continually asks the server for the same list of states or Order Status Codes. Now without any caching the app will simply ask the same question over and over again. These values don't change very often. I remember back in the late 80's Puerto Rico was considering becoming the 51st state. I often joke that California may fall off into the ocean after an earthquake. These are pretty dramatic events that I joke about. Let's say you were caching these values locally, and once a day the client asks the server for what changed. Not all states, but just what changed. Now, let's look a that 5,000 item product catalog. How often do they change? Maybe a few additions, price changes, descriptions a week. Here, you might query once an hour. But again, not the whole catalog, but just what changed. While not as efficient as some alternatives, it's still much better than asking for all the values each time which clogs up the network with redundant data, as well as server load. So, just moving to a routine sync pull operation may already improve things, so keeping it simple may be the way to go.
Notification to Pull
Notification to pull essentially means the server sends a small notification to the client indicating there's something the client may be interested in. Think of this as dialing a phone number. You dial the number, and wait for a response. You don't dial the number, say "Hi Steve, how's thing? I had a great day today... " If someone is home, they answer and say hello. This initiates the conversation. Now you can go further and after 4 rings, voice mail picks up. While some people leave long conversations, most know to simply leave a summary message. Those who have friends that leave long messages usually figure out how to limit the time of a message. If the recipient is interested, they call back. Now, this may not be the best example, but it's what I've got for now. <g>
Getting back to our sync design: With any central system, you want to aggregate as many individual operations as possible. Lets continue our States example from above. Let's say we have 10,000 clients that are caching the list of states. Rather than the central database track each client, the central database simply tracks if anything changes. The item being tracked could be called an Article. Another set of components tracks that something has changed and looks up who's interested in the change, we can call it a subscription. Now this get's complex quickly, so let me try a drawing:
As we quickly see, relational database changes aren't the only thing we may be interested in being notified about. Wouldn't it be nice to know files have changed, application updates are available, or any other type of resource you could imagine?
A listener listens or checks for changes to specific sources. You might plug in Service Broker, Query Notification, or might even use Sync Services here. When changes are available the listener services looks up to see what clients are interested in this data source. The listener then queues up a notification. Now, since we're only notifying a change has occurred, not the details of the change, the Listener service can perform an Upsert to any notifications that may be queued for each client. There's no reason to tell the same client the same thing has changed multiple times in a single notification.
Another service looks for notifications to be sent. But where does it send them? Remember, while servers tend to be stable, with known IP addresses, clients are constantly moving. Not only does their IP address change, their method of communication may change as well. We're in a chicken & egg problem, so let me just defer this problem to the Here I Am service below. Assuming we know the last know location for a particular client, we can now attempt to ping the client. In the picture, I've labeled it "Hello" or step A. If the appropriate client is still there, it would Ack back with the appropriate authentication, step B. If another client has roamed in and now occupies the address, the Ack would fail, and the Notification Distribution service can update the Last_Known_Location as out of range since the client didn't Ack back with the appropriate credentials. Now, assuming the Ack was successful, the Notification is sent to the client, step C, simply saying, "There was a change to Article X". The client can then chose what to do with that info. It may decide to ignore it as it just did a sync and it has a policy of only synching once/hour, or it could sync immediately. Of course it could also check for application updates.
Here I Am Service
This is a standard problem the cellular network had to solve. When a call is made to your cell phone attempting to broadcast to the entire planet doesn't scale. Instead you're cell phone is continually interacting with a clearing house that says, here I am. I'm currently in Sammamish WA, USA. As I roam to different locations, my phone "phones home" and says, here I am; I'm in NYC NY, USA. Now, I like to go skiing and snowmobiling in the great white north. If I'm not careful, my phone could drain while I'm out of range. This is because transmitting a signal consumes a lot more power than receiving. My phone is constantly trying to find a tower. Once it does, it says Here I Am, and can then just go into receive mode and reduce battery usage. Only when it goes out of range does it need to transmit again to establish a last known location. In the mountains, I'm constantly in/out of range, thus the battery drain.
Why send just the notification, not the data
You could argue the same infrastructure above could be used to send the actual changes. While that's true, it's not the most efficient. Depending on your infrastructure, your clients may likely be out of range for multiple updates. Rather than sending blocks of updates the client may not even be interested in, systems tend to scale better by simply doing the notification to pull.
So, as you can see, a Notification to Pull model is very powerful, but quite complex. There are a lot of moving parts to build, configure, maintain. We have a number of the parts including:
- SQL Server - Ability to store all this data on the server
- SQL Server Compact - Ability to store all this data easily on the client with minimal deployment issues
- Sync Services for ADO.NET - The pull operation to send/receive net changes between the server and the client
- SQL Server Service Broker - Ability to queue up operational changes within the data center
- WCF - communication stack between the client and the server over various protocols
- Query Notifications - Ability for the server to notify a listener service that something has changed
- Change Notifications - SQL Server 2008 introduces a more efficient way to track changes that can be used with Query Notifications to provide an efficient way to know when something changed, and what the details of the change are.
Building out this infrastructure may be more than you need right away. However, if you look closely, the configuration of Sync Services doesn't actually change. We're simply layering a notification system to determine when to pull.