Following Microsoft’s big data strategy is a bit like watching the US presidential election. We’ve endured months of shifting opinions, surprise partnerships, secrets and leaked information. Now the Seattle giant is capitalizing on the attention from this week’s Strata and Hadoop World to reveal a bit more about its big data play.
A Peek at What’s Coming
Microsoft announced it had partnered with Hortonworks and was working on a Windows version of Hadoop last October and promised releases for Windows Azure and Server would soon follow. The first community technology preview (CTP) for Azure came two months later, and Microsoft decided to halt its homegrown big data project, Dryad.
A second preview of Hadoop on Azure soon followed, but many began to question the noticeable absence and lack of comments about release of Hadoop for Windows server. Rumors began circulating that Microsoft was only releasing a cloud-based version of Hadoop.
Today at the Strata Conference, Microsoft finally stepped forward and responded to all the questions and rumors — in the form of software. The company announced the first public preview release of Hadoop for Windows Server and revealed the product’s official name, Microsoft HDInsight Server for Windows. The company also announced the third release of Hadoop for Azure, now named Azure HDInsight Service, and an expanded relationship with Hortonworks.
HDInsight includes most of the major components typically associated with Hadoop development: the Hadoop core (Hadoop Distributed File System (HDFS) and Map Reduce), Pig (MapReduce programming) and Hive (querying) plus a few Hortonworks developed goodies like Ambari, a monitoring and management console. Microsoft and Hortonworks have reengineered all the components to function on Windows and are contributing the code back to the community. HDInsight works with SQL Server and integrates with System Center, Hyper-V and Active Directory.
Getting More Details on Microsoft’s Hadoop
Microsoft will not comment on release dates for production versions. Curiously, the company is also refusing to discuss new features in the latest Hadoop releases. However, anyone can take the products for a test drive by downloading them from the SQL Server site.
Microsoft’s release of Hadoop to its on premises offerings could dramatically expand the use of big data tools in the mainstream. Not because Microsoft's Hadoop distribution or strategy is significantly better than other vendors that are playing in the space, but because almost every business user on the planet has access to Excel — the defacto client interface for interacting with HDInsight.