With companies such as IBM and Oracle now targeting the Big Data space, it seemed like Microsoft had been dragging its heels. Not so. Duriing the week at its Research Faculty Summit in Redmond, it announced a set of tools and services built around the Azure platform that will redress the balance.
Setting Up Daytona
Known as Project Daytona, Microsoft says it has developed a runtime version of Google's open-license MapReduce model for Windows Azure that will support a range of analytics and machine learning algorithms and can be scaled out to hundreds of server cores for analysis of distributed data.
According to a post on Microsoft Research’s blog, the move toward Big Data analysis has come from researchers who are looking for a data analysis and processing framework across industries such as healthcare, education and environmental science where large sets of data are being used on a daily basis and where users are looking for ways to gain insights contained with that data.
Built on existing Azure compute and data services, to use and deploy it users need to follow three steps including:
- Develop data analytics algorithms: Project Daytona enables a data analytics or machine learning algorithm to be authored as a set of Map and Reduce tasks, without users having to have in-dept knowledge of distributed computing or even Azure.
- Data Libraries: Users then need to upload data and data analytics routines into Azure.
- Deploy the Daytona runtime: Deploy the Daytona runtime to a Windows Azure account, with users able to configure the number of virtual machines for the deployment, and specify and configure the storage account on Windows Azure for the analysis results
By doing this, Project Daytona breaks up data into smaller lumps so it can be processed after Dayton has deployed the MapReduce runtime to all the machines concerned. When the analysis of the data, which occurs simultaneously, is finished, the results are combined in to a final result that is easier for users to interpret.
What Daytona Offers
There are some obvious applications for Daytona, particularly in the area of cloud computing, which Microsoft has been working on for a while. Properties of Daytona include:
- Cloud Designed: Designed for cloud computing and particularly for Azure, Daytona connects virtual machines regardless of infrastructure or platform-as-a-service.
- Designed for cloud storage services: Daytona can consume data with minimum overheads and with the ability to recover from failures, using the automatic persistence and replication that comes with Azure storage services.
- Horizontally scalable and elastic: Analysis of chunks of data is done in parallel, so to scale a large data-analytics computation, users can add more machines to the deployment easily.
- Optimized for data analytics: Daytona was designed to provide support for iterative computations in its core runtime and caches data between computations to reduce communication overheads.
Daytona is still only in a research technology preview (RTP) stage and there is still work to be done on it, Microsoft says, with fine-tuning still needed and the addition of new functionality also flagged.
Big Data Market
But there is already a considerable amount of movement in the space. Anyone who has looked at any of the quarterly figures for Oracle (news, site), for example, will see how important their Exadata system is for them already, and how much more important they expect it to be in the future.
Last month, IBM (news, site) announced the first Big Data release from its US$ 1.8 billion acquisition of Netezza. IBM has been vocal about its ambitions to become the biggest player in the business intelligence and analytics space.
It has also been playing this field for quite some time and many of its purchases over the past three years have been in this space.
The result has been the announcement of new, commercial services and products to analyze Big Data in the wake of the closure of the NISC and Initiate deals.
Microsoft, even with this offering -- which appears to have some way to go before it is commercially ready -- is again playing catch-up in the cloud space. But if Microsoft is anything, it is persistent, and it is unlikely to be put off by the well-established presence of some real market heavyweights it what looks to be a lucrative market. If you want to download the research version, check it out here.