Gartner analyst Merv Adrian wrote an interesting blog post earlier this year titled “Hadoop is in the Eye of the Beholder.” In it, he posed an interesting question. What is Hadoop?
We’re not going to rehash his argument. It’s a fine and even funny post. You should read it. But if you don’t, suffice it to say that he concluded that it’s not a question that’s easily answered. As a result, if you’re an enterprise that is “doing Hadoop,” “buying Hadoop,” or “shopping for Hadoop,” what exactly are you talking about?
That Illusive 'Something'
Chances are good that it’s “something” that you hope will help you reap rewards from your large data stores. Almost every Hadoop option out there includes three totally free open source components: HDFS (Hadoop Distributed File System), YARN (a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications) and MapReduce (a programming model for large scale data processing). These can all be downloaded free from the Apache Software Foundation.
Most everyone will agree that what you’d be downloading isn’t all that you need. As a result, several Apache projects like Apache Pig, Apache Hive, Apache HBase and Apache Spark are commonly added to the Hadoop mix. (They can also be downloaded free of charge.)
Though, theoretically, you could make a go of it from there, most enterprises don’t. Instead, in one way or another, they spend money to become productive with one of five commercial vendors’ enterprise-grade Hadoop distros. Though nearly every vendor lets you download a version of one of their distros for free, chances are good that you’ll ultimately end up paying one of them for something, be it software, services, support, subscriptions and so on.
MapR, for example, sells proprietary big data crunching software that leverages Apache Hadoop and other Apache projects and promises to provide better ROI benefits than going open source only.
Cloudera, provides an enterprise Hadoop distro for free (and it plays a big role in the Apache Community), but charges for some solutions it has developed internally that add value to Apache Hadoop and ancillary open source projects.
Hortonworks, together with the community that builds Apache open source software around core Hadoop, makes its money via client subscriptions. Companies like Pivotal and IBM leverage open source Apache Hadoop, and other open source technologies within their larger offerings, but they also sell proprietary software and services.
The degree to which each of these vendors employees contribute code to the various Apache projects varies. Hortonworks, for example, uses nearly all of its engineers to write code for the community. MapR and Cloudera do so as well, but they also write code that is considered to be Intellectual Property. Ditto for IBM and Pivotal.
The Next Big Thing?
The reason being aware of all of this matters is that as soon as something that promises to improve on Apache Hadoop but was not built within and Apache project, it becomes something else — in most cases proprietary software.
That’s why today’s announcement of Hortonworks and Pivotal joining forces to build out Apache Ambari is a big deal. The two big data powerhouses are investing the time and effort of some of their most talented employees to work as part of the community to build out a solution that makes the management of Hadoop clusters dead simple. And they’re not the only ones involved. So are engineers from Red Hat, IBM and WanDisco among others.
While there’s nothing to keep the aforementioned vendors or their customers from managing Hadoop in other ways (in fact Pivotal probably does), actions usually speak louder than words. So, especially when you consider that Microsoft, SAP, HP and Teradata all partner with Hortonworks on Hadoop, it’s reasonable to believe that Apache Ambari could become the standard by which Apache Hadoop is managed.
What about Cloudera Manager and MapR’s cluster management solutions? They can keep selling them — or grab Apache Ambari and weave it into their solutions free of charge.