The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a codename for a project that ironically actually does what the codename says.
In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative.
Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again.
Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish.
The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location.
So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does?
Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage.
After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets.
Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924
You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options.
You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938