Astoria futures: offline-enabled data services


We mentioned that we were doing some early thinking of “Astoria Offline” back in Mix 2008, where we even demo’ed an early proof of concept. Now we’ve been working on various design aspects of Data Services for its future versions, and synchronization/offline support is one of them. It’s still an experimental thing with no official home or release vehicle, so this is the best time to follow the design process if you find the scenario interesting, as this is when it’s easiest to influence the direction we’ll go for.

A short way of describing this can be: “imagine you can point Visual Studio to a data service and say ‘take it offline’, and things just happen”.

Of course, the real world is more complicated than that :-)


Astoria Design Walkthrough: Thinking of a future with sync & offline

 

In this first note I’ll just touch on the scenarios we want to hit and go over a few guiding principles. In future posts I’ll elaborate more on the details.

We have many scenarios in mind for this infrastructure. The ones we’re thinking of tackling first:

· Outlook type of apps: I’m sure there is a fancier way of saying this, but anyone that has used Microsoft Outlook knows what I mean. The application is basically a 1-tier app that interacts with a local (embedded) database. In the background -and independent from UI activity- 2-way synchronization with a data service (e.g. a Microsoft Exchange server) happens. Often sync’ing against a database is not quite what you want…”Astoria Offline” will let you sync against your data-service layer, where the usual business logic/validation/etc will run just like in the online path.

· The description above sort of implies that server and client are built in collaboration, perhaps as part of the same development team. That’s certainly an scenario and we can do some things easier when that’s the case. But the other scenario we want to tackle is when client and server in a synchronization relationship are independent from each other (e.g. sync a service that’s just available for sync on the web).

· Local replicas of cloud-stored data: as more online services offer structured storage capabilities, and more of them use the Data Services REST interface, it becomes more interesting to be able to synchronize that data locally either for latency reduction, offline operation or other reasons.

· Data consolidation: if you have multiple data services that expose data from a variety of sources (some databases, some online/”cloud” stores, some custom repositories), you may want to synchronize a slice of data of each store to a local database, and then work with the data locally.

A couple of guiding principles:

· We will stick to a simple and open interface. What that means is that while we will definitely build a nice end-to-end integrated story for Visual Studio, it will be on top of a well-documented underlying data exchange using just HTTP and known formats. Anybody with an HTTP client and enough knowledge of our sync strategy should be able to synchronize with a data service.

· Data independence will remain there for sync as it is already for online access. Today when you access a data service the interface is the same regardless of whether the service is backed by a database, a cloud store, some custom application or whatever. With sync, the same applies. If the data service is sync-enabled you can sync with it, no matter what backs it.

· We are targeting data services for structured stores and business applications. That implies certain level of sophistication in the shape of data, such as assuming cross-item dependencies, store-level and application-level constraints that dictate consistent states of data, the need for making partial progress during synchronization, etc. Such support does come with some extra complexity, but we think it’s the right target.

We’re just taking on this space, so any feedback you may have is good. Are our initial scenarios interesting? Do you need this thing at all? Does the initial direction we’re looking at sound reasonable?

btw – if you are going to be at PDC, we have a full talk on this at the event.

I hope this “short video” format that Andy wants to do for our design notes adds a good little twist and makes them more interesting.

Pablo Castro
Software Architect
Microsoft Corporation
http://blogs.msdn.com/pablo

This post is part of the transparent design exercise in the Astoria Team. To understand how it works and how your feedback will be used please look at this post.

Comments (17)

  1. Chris says:

    I definitely believe this is the right direction.  The shift to smart client technologies (WPF, silverlight, etc.) is going to demand sync/offline design models.  We have been deciding whether we wanted to tackle this development in a one-off scenario, and have decided against it due to time constraints and complexity.  I believe if you’re team delivers a CTP in Q4 or early Q1, we may be in a position to start testing it in parallel with our development.

  2. Tantalising post from the Astoria team.

  3. Asheesh Soni says:

    Definitely!

    Every time I discuss software requirements with my clients, they say…"All that sounds good, but can you make it like Outlook, so we could work offline and then sync it later?"

    Some how, I have to explain them how it’ll blow their budget.

    And I wish… wouldn’t it be nice if we had offline enabled data services?

    Well… there you are… talking about exactly what we need!

  4. Asheesh Soni says:

    Instead of a really really long comment in this post, I have blogged about our Scenario, Existing Architecture, and Project Requirements here:

    http://asheeshsoni.blogspot.com/2008/10/astoria-futures-offline-enabled-data.html

    There is also a poll on the sort of clients that’ll consume your Astoria Offline services.

    Feel free to discuss how you think your organization / applications could utilize Astoria Offline, and what features you look forward to in it.

    Cheers

    Asheesh Soni

  5. Sven says:

    I’m wondering which data storage mechanism Astoria OffLine will be employing to cache the data locally.

    Since PDA’s seem to come into the picture I’m guessing Sql Server Compact will be a strong contender there.

    Oh Wait, Steve Lasker is involved (http://blogs.msdn.com/stevelasker/archive/2008/10/18/evolution-or-revolution-for-moving-to-offline-architectures.aspx) so there’s no need for guessing anymore :-)

  6. pabloc says:

    Sven: yes, our primary focus for the offline client storage is SQL Server Compact for devices and desktop scenarios, although we’re also exploring having support for other options as well, such as SQL Server Express for the desktop case.

    Stories of what worked best in your applications are welcome, as it can help guide our decisions around this.

    -pablo

  7. Sven says:

    Pablo, thanks for answering. Unfortunately I have no stories to tell about what worked best. We’re still in the planning stages when it comes to local data caching. In fact, we’re still planning our move to Silverlight.

    The only thing I can offer is that the scenario where "server and client are built in collaboration, as part of the same development team" applies to us.

    And I think it will apply to many teams that are going to port their N-tiered Winforms apps to WPF/Silverlight + Astoria as we are. So if you have to postpone support for the other scenario "where client and server are independent" to V2, that would be acceptable, IMHO.

    -Sven

  8. Neil W says:

    For very light weight applications (such as Windows Mobile) the choice to focus on CE is spot on. Unfortunatly, most applications (in my humble opinion) will focus on LAPTOP application development. Due to very limited nature of CE as a database (no t-sql, no views, stored procedures, no xml, no role based security) I would love to see focus put on the Express edition. This will allow for very robust client/database applications to be developed.

  9. James H says:

    This sounds very much like an application we are developing at my company right now except that we were not able to use CE, we are using Sql Express 2008 and http merge replication from the desktop back to our server.  Compact edition does not support some of the features we use such as stored procedures and the new file stream data type.  I would like to see Sql Express supported as well as CE.

  10. ChrisLamont-Mankowski says:

    Since now is the time to influence the design … I’d really appreciate the ability to synchronize a subset of data; either on a per-table basis, or on a WHERE restriction.  This is because I usually deal with hundreds of thousands of rows; where only a certain select group of rows is relevant to each user who would sync.  I’d like to limit the cached response to a certain date range, or to exclude some BLOB columns.  This is comparable to the "Download Headers Only" feature in Outlook….

    It would be nice to have the ability to sync my app to Astoria and also to Azure without having to re-port my code again.  I’m expecting to be in the situation where I had just ported my code from a legacy access techology to Astoria/EF.  I wouldn’t be happy doing it again to support the cloud.

    The final entry on my wish list would be an interface (or built in support) for caching pages of data.  In my head, there seems to be a natural fit with caching and sync… In other words, transparently cache data as I work with the Data.Services.Client.  This would be very helpful for my low bandwith clients.

  11. pabloc says:

    Neil/James: thanks for the feedback on SQL Express…will definitely take this into account.

    Chris: let me comment on each item separately:

    Subsets: yes, absolutely, we want to support sync’ing a subset of the data and we’re exploring a couple of approaches. A couple of the challenges involved with this are cross-record dependency (a WHERE clause alone won’t do, you need to make sure that you take a subgraph with no outgoing dependencies). Do you have a few examples the context of your applications of how you’d like to slide the data?

    Regarding Astoria versus Azure, I agree that it makes sense. We’re working with the SQL Data Services folks to explore this space. It’s still early in the process, but it’s certainly an area we’d like to look into more.

    Caching: I hesitate about this a bit. I’m not a big fan if "magically seamless" interfaces, as when the differences start to permeate to the interface (and most of the time they do) the model breaks and the application assumptions fall appart. I tend to go with the explicit route more…but that’s more of a style thing. Point taken.

    -pablo

  12. Nikunj Mehta says:

    Pablo,

    You state your aversion to caching and seamless online/off-line applications. If the data service is provided by Astoria, then there is no problem. However, if the data source is not Astoria-based then sync falls apart, and there is no chance of getting the full set of data locally in a way that guarantees that "all the data required for offline" is stored locally at sync time.

    This is no different from the SOAP story, where everyone thought it would enable interoperable services only to find that the technologies were good enough only for intranet solutions built with the same tools and technologies on both sides of the interaction.

    If you are interested in seeing advances made in the area of cache syncing standard AtomPub services (which by the way are way more interoperable than Astoria-only data sources), then you should check out the details of AtomDB. My blog http://o-micron.blogspot.com has details about AtomDB and so does the feed technology center on OTN http://oracle.com/technology/tech/feeds/.

    I hope this comment makes its way to your blog post. I appreciate your interest in this area and look forward to an invigorating discussion.

    Nikunj Mehta

  13. As we already discussed in a previous blog post , one of the problem spaces related to data services

  14. Hi Pablo,

    Apart from having some client support for the sync framework (and the service side counterpart as well), are you planning to include feedsync extensions as part of the ATOM feed that can be get from the output of an existing ADO.NET service ?. This would be definitively good for using any technology that talks FeedSync on the other end.

    Thanks

    Pablo.

  15. ASPInsiders says:

    I’ve given a number of presentations on ADO.NET Data Services (formerly codenamed: "Astoria")

  16. In October of last year we started to talk publicly about an exploration project we called “Astoria