Openness Research and BigData

openness

Despite common misconceptions Microsoft now has extensive interoperability with open source technologies for example you can run a php application on Azure, get support from us to run RedHat, SUSE or CentOs on Hyper-V and manage all your applications from System Center 2012.

Azure

So how are academics using these resources and services

Prof. Baesen's  from Katholieke Universiteit Leuven (a.k.a. KU Leuven) in Belgium published yesterday a paper called "Beyond the hype: cloud computing in analytics". looking at Machine Learning. KU Leuven has set up a benchmarking experiment using Machine Learning techniques used in analytics, the Microsoft Windows Azure cloud platform and the middleware of Techila Technologies. The results were compared with those obtained in a non-parallelized setup. The results show that significant analysis speed-ups can be gained when performing computational tasks in cloud.

Researchers have amazing opportunity now with Microsoft and Openness additionally were extending this approach to the world of big data with Hadoop.

hadoop

As you know from my previous posts Hadoop uses map reduce, the key to the power and scability of Hadoop is that it applies these map reduce concept on large clusters of servers by getting each node to run the functions locally, thus taking the code to the data to minimise IO and network traffic using its own file system – HDFS. 

Big Data

As your all aware there are lots of toolsets for Hadoop, many of these are built on Hive which presents HDFS as a data warehouse that you can run SQL against and the PIG (latin) language where you load data and work with your functions.

Whats New!

Microsoft are developing in conjunction with developer Horton Works the following functionality:

  • an ODBC driver to connect to Hive
  • an addin in Excel to query the Hive
  • the ability to run Hadoop as a service on Windows Server
  • the ability run Hadoop on Azure and this create clusters and when you need them and use Azures massive connectivity to the internet to pull data in there rather than choke bandwidth to your own data centre.
  • F# programming for Hadoop.

At the time of writing there these tools are still in development and there is only “by invitation” admission to Hadoop on Azure. If your interested in this please do get in touch. simply email ukfac@microsoft.com