Working with data in cloud solutions

In this blog post, we'll give an introduction to working with data in cloud solutions.

Overview

Working with data is a critical part in most solutions. In a cloud solution, we can adopt most guidelines we already have for on-premises solutions. However, cloud solution also has its unique use cases in working with data. In this post, we will discuss the following use cases:

· Expose your cloud data to the rest of the world.

· Expose your on-premises data to your cloud applications.

Common considerations

In either use case, there're a few common considerations that you need to decide before going on.

Choose a protocol

In an SOA world, the most important concept is contract. In a cloud world, when it comes to communication, the most important concept is also contract. When there is a common contract that is adopted by lots of cloud applications, we call it a protocol.

In the data communication scenario, if you choose Microsoft cloud solution, the recommended protocol is the Open Data Protocol (OData). Based on open standards such as HTTP and AtomPub, OData provides a consistent solution to deliver data across multiple platforms. If your cloud service exposes data using the OData protocol, the rest of the world can consume your data using the same solution as they consume other OData compatible cloud services. Likewise, OData provides the ability for your cloud applications to consume your on-premises data in a consistent manner.

A lot of products are already using OData. Just to name a few: Windows Azure Table Storage, Dallas, SharePoint 2010, SQL Server 2008 R2, and so on.

If you want to choose other protocols, it is important to investigate how scalable the protocol is, what's the adoption rate, and so on.

 

Choose a technology

After the contract (protocol) is chosen, it is time to choose a proper technology to implement the protocol.

If you choose Microsoft cloud solution, the recommended technology for communication among applications is WCF. And when it comes to data, the WCF Data Services is the de-facto choice.

First of all, WCF Data Services are WCF services, so all your existing knowledge about WCF can be used. In addition, WCF Data Services help you to implement the OData protocol without dealing with the underlying specification. You can focus on the CLR representation of your data model rather than the actual AtomPub/JSON messages being transferred over the network. What's more, WCF Data Services focus on data communication rather than data storage. The data source can come from any places: an on-premises database, a cloud database, external web services, xml files, and so on. No matter where the data comes from, you can expose and/or them in the same way.

If you choose other technologies, it is important to investigate how much effort it takes to implement the protocol you choose using the technology, how much effort it takes to extend the solution in the future, and so on.

Now that we've discussed the common considerations, let's have a look on how Microsoft products help you to realize the above mentioned two use cases.

Expose your cloud data to the rest of the world

A lot of cloud solutions involves interaction with the rest of the world. When it comes to data, the word DaaS (Data as a Service) might be the first to occur in your mind.

Cloud data can be stored in a lot of places, and there're a lot of kinds of data. For simplicity, we will focus on structured (think of xml) and relational (think of relational databases) data in the rest of this blog post. Currently Microsoft provides 2 cloud data storage products:

· Windows Azure Table Storage: It allows you to store structured data in the cloud. It uses flexible (dynamic) schema.

· SQL Azure: It allows you to store relational data in the cloud. It uses fixed schema.

The following table compares fixed schema with dynamic schema:

Fixed schema

Dynamic schema

Relational databases such as SQL Azure

Windows Azure Table Storage

Proven by decades of years’ experience

Highly extensible (single storage, but different schema for different apps)

Lots of existing products and tools

Web friendly, Open

O/R Mapping to take advantage of OO languages

Take advantage of dynamic languages.

 

You should choose a proper data storage based on your scenario. In most cases, if you want to enable write access on your data to the rest of the world, a dynamic schema is preferred, because third party applications may want to modify the schema a bit to adopt their scenarios. But considering the current limitation of Windows Azure Table Storage (not all features of OData are implemented), and the fact that the relational model has been proven by decades of years' experience, it makes sense to use a fixed schema if it takes too much effort to adopt dynamic schema.

No matter what schema you choose, OData and WCF Data Services can help.

As discussed above, WCF Data Services can be used with all kinds of data sources. It ships with 2 data providers out of box: ADO.NET Entity Framework (EDM) and LINQ to SQL (L2S). When using these data providers, it generally only takes a few line of code to work out a solution. If you choose SQL Azure to store your data, you can use EDM and L2S to access the database.

When working with other data providers (such as Windows Azure Table Storage), you're required to convert the source data model to a data model that WCF Data Services understand. This is usually a trivial task if your service is read only. You just need a class that describes your data model. If you need to support full CRUD, you must implement the IUpdatable interface. This is called a "Reflection provider for WCF Data Services". In more advanced scenarios, you can also use Custom Data Service Providers. For more information, please refer to https://msdn.microsoft.com/en-us/library/dd672591(VS.100).aspx.

Windows Azure Table Storage itself uses OData protocol, so you may attempt to allow your clients to access your table storage directly. But doing so is not recommended in most scenarios. You must protect your storage account key at all cost, otherwise it is you who ends to pay for the storage usage abuse that occurs from a "trusted" hacker who you've sent your storage key so he/she can access your storage directly. In addition, ever since the SOA era at the beginning of this century, it is often recommended to wrap your data and business logics into services. That's why the recommended solution is to use WCF Data Services.

You can download a sample from All-In-One Code Framework (Azure).zip that demonstrates how to expose your cloud data stored in Windows Azure Table Storage to the rest of the world using WCF Data Services. The sample name is: CSAzureTableStorageWCFDS/VBAzureTableStorageWCFDS.

Expose your on-premises data to your cloud applications

Another common use case in a cloud solution is to Expose your on-premises data to your cloud applications. In most cases, the data is stored in a relational database (such as SQL Server) using fixed schema. So generally you do not need to worry about data storage. In this case, what you need to consider is connectivity and security.

Most companies have firewalls and NATs. It is very difficult to find a machine that is accessible from internet, and at the same time, has a static IP address. This makes it very difficult to communicate with the database server directly from a cloud application. In addition, it can be tricky to control access to your database. Cloud applications do not live in the same domain as your intranet, so it's impossible to use integrated Windows Authentication, and federated authentication solutions do not work well with databases yet.

To address the connectivity issue, Microsoft provides you with Windows Azure platform AppFabric Service Bus. Service Bus works as a bridge between your on-premises services and the cloud applications. Your on-premises server works as a client to the Service Bus, so even if it is beyond an NAT, it can communicate with Service Bus. Service Bus will relay messages sent from your cloud applications to your on-premises services.

Service Bus also supports both TCP and HTTP communication protocols. Most firewalls permits outbounding traffic over port 80/443, and that's the minimum requirement for Service Bus to work. Thus, Service Bus is able to traverse both NAT and firewall. The only obstacle that it can't help to overcome is proxy.

As for security, it is a complex topic. We won't cover too much details in this blog post. But Windows Azure platform AppFabric Access Control can help in most cases, and it can work together with Service Bus.

Once again, OData and WCF Data Services can help in this use case.

You can download a sample from All-In-One Code Framework (Azure).zip that demonstrates how to expose your on-premises data stored in SQL Server to the cloud. The sample name is: CSAzureServiceBusWCFDS/VBAzureServiceBusWCFDS. The sample also provides an ASP.NET client that you can use to test the service.