While the availability of Windows Azure and the announcement of Silverlight 4 were certainly highlights of PDC 09, I was especially intrigued by the introduction of “Dallas,” the code name for Microsoft’s Data-as-a-Service (DaaS) offering. So after the relatives had left and my turkey coma had subsided, I thought I’d give “Dallas” a whirl this Thanksgiving weekend.
What’s “Dallas” all about?
“Dallas” is essentially a repository of data repositories, a service - built completely on Windows Azure - that allows consumers (developers and information workers) to discover, access, and purchase data, images, and real-time services.
“Dallas” essentially democratizes data, enabling a one-stop shopping place (via PinPoint) for all types of premium content. With “Dallas” one can opt in to a pay-as-you-grow type model, facilitating access to data that may have previously only been accessible via expensive subscriptions directly with the data provider.
Developers can access “Dallas” via REST-based APIs and Atom feeds or in raw format (as many of the content providers had made available pre-“Dallas”). The web-accessible “Dallas” Service Explorer (shown below) allows the consumer to explore the data as well as the HTTP URLs that are constructed and executed to retrieve the data set based on the user-provided parameters.
Since the data services are accessed via standard protocols (REST, HTTP, Atom), “Dallas” can be used by a vast variety of clients – PHP, Ruby, Java, etc. – in addition to .NET, of course. For .NET developers, the Service Explorer offers a convenient option to generate a C# proxy class that essentially provides a wrapper for the HTTP/REST API.
Current data providers (offering trial periods of access) include:
- Associated Press Online
- Data.gov (specifically FBI’s Uniform Crime Reporting program)
- infoUSA (data on UK, US, and Canadian businesses)
- UNdata (UNESCO Institute of Statistics and World Health Organization data)
- NASA (Mars orbital and Rover images)
To get started with “Dallas,” access the Quick Start page under the Windows Azure Platform portal. You’ll first need to request an invitation code (associated with your Live ID); that code will provision you with an account key that you then submit as part of the HTTP header with each of your data requests.
Note: the account key is your private key and should not be shared with anyone. The unique user ID is a GUID representing individual users. Both are required when submitting a request for data.
While you’re exploring “Dallas” on your own, a single (or random) user ID is sufficient, but when you deploy your application, you may want to set up multiple user IDs to track requests by user for analysis and/or billing purposes.
Once you have your account key, you’ll have access to the “Dallas” Developer Portal (below), where you can view your current data subscriptions, other available providers (catalog), your account keys, and an access report (how many requests were made to each of your subscriptions).
Subscribing to a Service
This part’s pretty easy; the Catalog link brings you to the list of available (and forthcoming providers). Find the ones that look interesting - they’re all free for the time being – and click the 'subscribe’ link.
Once you’ve subscribed to a service it will be listed under Subscriptions in the “Dallas” Developer Portal, and you’ll find a link for each service through which you can explore its dataset via the “Dallas” Service Explorer.
Exploring the Data via “Dallas” Service Explorer
I’ve subscribed to the Data.gov feed for 2006 and 2007 crime statistics, and below is a preview of the data in the “Dallas” Service Explorer.
I’ve numbered the five primary sections of the output as well.
- The data itself. Here it’s displayed in a convenient tabular format, but you can choose Atom 1.0 and raw formats as well, via the links directly above the data display. You can also have the data returned in these formats via the Invoke as... button in section 4 (below).
- Many services require some type of input parameters to define the subset of data desired. This service ostensibly requires a State, but the City and Year are optional. The parameters are submitted as part of the request URL. You can implement paging semantics declaratively, which translates into specific URL tokens that the “Dallas” service will automatically handle for you.
- These are the credentials under which the service will be executed. Remember, the Account Key should be kept private, and the Unique User ID can be used at your discretion to attribute the service access to a particular user in your own application’s domain. “Dallas” can also integrate with the Windows Azure Access Control Service to provide for federated identity scenarios.
- The buttons here provide several options for retrieving and analyzing the data:
- The Invoke as… button will call the service with the $format parameter indicating the data format to be returned (e.g., Atom, Raw)
- Clicking the Preview button will initiate the service call with the given parameters and refresh the page with the new results. Note: in the trial stage of “Dallas”, a maximum of 100 result rows are returned.
- The Analyze button feeds the service data into Excel 2010’s new PowerPivot capability (previously known as Project “Gemini”), an add-in that automatically creates a pivot table from the data and provides a number of readily accessible data manipulation and visualization options.
With PowerPivot, you have an instantaneous business intelligence capability enabling you to analyze your own private data against public data obtained from content providers in “Dallas”. For example, the following chart was created to display the number of countries with life expectancies between 70 and 79 years as documented for the years 1990, 2000, and 2006 by the World Health Organization.
- This section provides the raw data for making service requests. The HTTP URL and request header information can be used in practically any programming environment to make the service request (the results of which are typically returned in Atom format). For .NET developers, the ‘Download C# service classes’ link generates a C# class file for you which you can then incorporate into the application invoking the service (see below).
Invoking the Service via a raw HTTP request
One of the best ways to investigate HTTP traffic is via a network sniffer tool such as Fiddler. With Fiddler, you can fashion your own HTTP requests, execute them, and view the resulting HTTP response. By using the “Dallas” Service Explorer, I can see, for instance, that the URL for a request for 2007 crime statistics for Massachusetts should be issued as follows:
and the $accountKey and $uniqueUserID are required as request headers.
So via the Request Builder tab in Fiddler, I can build up the HTTP request as you see below. Note, the headers are separated from their values via a : (colon) not = (equals sign) as implied by the “Dallas” Service Explorer.
or, in raw format:
GET /DataGovService.svc/crimes/Massachusetts?year=2007&$format=atom10 HTTP/1.1
Then, via the Inspectors tab in Fiddler, I can view the output, shown below in XML format, which Atom 1.0 also uses.
Invoking the Service via a C# Service Proxy
When you use the option in the “Dallas” Service Explorer to create C# service classes, a .cs file is downloaded to your machine. That file includes one or more pairs of classes (in the namespace Microsoft.Dallas.Services). One of the classes is a data transfer object with properties corresponding to the data output, and the other is the actual proxy class that exposes a few methods:
constructor – which sets up the request URL and parameters, including the paging values.
InvokeWebService –a private method that creates and executes an HttpWebRequest (which returns results as an Atom feed) and extracts the Atom entries as an IEnumerable<XElement>.
Invoke – the publically accessible method that synchronously calls InvokeWebService and transforms the result into a strongly-typed List, the type of which is the other class defined – the data transfer object - in the generated C# file.
Pulling this C# file into a Windows Forms, WPF, or ASP.NET application is pretty straightforward. Here’s the entirety of code needed to display the 2007 Massachusetts crime data in a Windows Forms DataGridView:
While analyzing and manipulating the data feeds exposed by “Dallas” is cool in and of itself, the real power of this offering comes when you aggregate your own domain data with the public data. You might have sales figures that you are trying to interpret against economic or census data to make decisions on future store openings… or maybe you’d like to factor in up-to-date traffic conditions in your courier scheduling application… or perhaps you want to calculate insurance rates based on FBI crime statistics. With project “Dallas”making these large, vetted premium data sets available, such types of analyses become viable for practically any application and developer.
“Dallas” was just announced at PDC 09, so it’s a great time to get involved and watch (and participate) as the service evolves. You’ll need an invitation code to get started, but while you’re waiting for that you can check out the following resources as well:
Intro to “Dallas” (Hands-on Lab)
Another very cool and ‘rewarding’, way to get to know “Dallas” and Windows Azure is to participate in the Pathfinder Innovation Challenge. NASA and the Jet Propulsion Laboratory in conjunction with Microsoft are sponsoring this contest for ages 14 and older to help foster an interest in and further the exploration of the Martian surface.
As you can read in the contest rules, there are four leagues of involvement, with various degrees of programming expertise recommended (beginning with League 1 that requires only an enthusiasm for the exploration of Mars!). Individuals or teams can compete for prizes ranging from a planetarium kit for a local secondary school, Zune HDs, to a trip to the launch of the Mars Science Laboratory in September 2011.
The contest is on-going with deadlines of February 15, 2010 and April 16, 2010 depending on the League of competition. See the contest rules for more information on deadlines and submission procedures.