Project Daytona - Iterative MapReduce on Windows Azure

 

clip_image001

Microsoft has developed an iterative MapReduce runtime for Windows Azure, code-named "Daytona."Project Daytona is designed to support a wide class of data analytics and machine learning algorithms. It can scale out to hundreds of server cores for analysis of distributed data.

Project Daytona was developed as part of the eXtreme Computing Group’s Cloud Research Engagement Initiative and made its debut at the Microsoft Research Faculty Summit

Project Daytona MapReduce Runtime for Windows Azure can be download, along with sample codes and instructional materials that researchers can use to set up their own large-scale, cloud data-analysis service on Windows Azure.

Key Properties

Project Daytona features the following key properties.

1. Designed for the cloud, specifically for Windows Azure.

2. Designed for cloud storage services.

3. Horizontally scalable and elastic.

4. Optimized for data analytics.

So what can you use Daytona for?

There are a number of use cases for Project Daytona,

1. Data analysis

2. Machine learning

3. Financial analysis

4. Text processing

5. Indexing, and search.

Almost any application that involves data manipulation and analysis can take advantage of Project Daytona to scale out processing on Windows Azure.

Data analytics as a service

Using Windows Azure, which is accessible to a host of clients NOT just windows clients!

Project Daytona is about turning utility cloud computing into a service model for data analytics. In our view, what is key is that this service is not limited to a single data collection or set of analytics, but the ability to upload data and select from an extensible library of models for data analysis. Powered by Project Daytona, the service will automatically scale out the data and analytics model across a pool of Windows Azure VMs without the overhead that is usually associated with typical business intelligence (BI) and data analysis projects.

Example of Application

Excel DataScope. From the familiar interface of Microsoft Excel, Excel DataScope enables researchers to accelerate data-driven decision making.

Project Daytona DataScope analytics service offers a library of data analytics and machine learning models, such as:-

1. Clustering

2. Outlier detection

3. Classification

4. Machine learning

5. Information visualization

Users can upload data in their Excel spreadsheet to the DataScope service or select a data set already in the cloud, and then select an analysis model from our Excel DataScope research ribbon to run against the selected data.

Project Daytona will scale out the model processing by using possibly hundreds of CPU cores to perform the analysis. The results can be returned to the Excel client or remain in the cloud for further processing and/or visualization. The algorithms and analysis techniques are applicable to any type of data, ranging from web analytics to survey, environmental, or social data.

· See Overview for information about what is included in the release package.

What’s Next for Project Daytona?

Project Daytona is part of an active research and development project in the eXtreme Computing Group of Microsoft Research.The current release of Project Daytona is a research technology preview (RTP). Microsoft Research are still tuning the performance of Project Daytona and adding new functionality.

For more information on project Daytona please see https://research.microsoft.com/en-us/projects/azure/daytona.aspx