Does Hadoop Need Saving

2015-20-February-Message-In-Bottle.jpg

It was a big week for big data in Silicon Valley where O’Reilly’s Strata & Hadoop World Conference is ending today. The star of the show might have been data scientist Vijay Subramanian of Rent the Runway whose company rents Oscar-worthy gowns (that most of us can’t afford to buy) for our one-night-only Cinderella moments. Or maybe it was data scientist Noelle Sio of Pivotal Labs who volunteered at CrisisTextLine which helps connect teens in trouble with the volunteer counselors who might help them. Or possibly President Barack Obama who streamed in via video to introduce DJ Patil as the United States’ Chief Data Scientist. Never mind all the vendors like Microsoft and MapR who made some impressive announcements.

But instead the halls were filled with talk about the news that Pivotal Software made when it open sourced the components of its big data suite (which we predicted and is unquestionably good news for everyone) and announced the Open Data Platform (ODP), an initiative that brings together GE, Hortonworks, IBM, Infosys, Pivotal, SAS, AltiScale, Capgemini, CenturyLink, EMC, Splunk, Verizon Enterprise Solutions, Teradata, and VMware (and is open to other companies that want to join).

ODP’s stated goal is to create “a tested reference core of Apache Hadoop, Apache Ambari and related Apache source artifacts” in order to help users get started with Hadoop.

It’s the latter announcement that has caused all of the flack, which quite frankly began just as the news was being announced. Why? Partly because two of the three primary Hadoop providers (each of which was invited) want nothing to do with the ODP.

Battle Lines Forming

One of the providers, MapR, took the high road and emailed us a comment that simply stated, “We decided not to join.” They also made note of the fact that the Apache Software Foundation (ASF), where Hadoop was built and continues to grow, already had an ODP-like project called Apache Bigtop. (More on this is the next article.)

It’s worth noting here that the Apache Software Foundation is the hallowed ground where the core of Hadoop and the ancillary components of many Hadoop distros were developed by individual contributors versus corporations. It is the lifeblood of the big data community where it is revered and respected. However, it should be said that most of Apache Hadoop contributors were, and are, on somebody’s payroll when they wrote the code.

Mike Olson, the cofounder and Chief Strategy Officer of Cloudera, the other Hadoop distro provider, reacted to the ODP news with a blog on his company’s site:

The Pivotal and Hortonworks (the latter is the Hadoop distro provider that did join) alliance, notwithstanding the marketing, is antithetical to the open source model and the Apache way.

While the ASF is open to vendors, the ODP isn’t actually open at all. As a vendor-driven consortium, membership is only for enterprises with serious money -- it ought to be called the 'Only Dollars Play' alliance.”

And to be fair, at least some of the companies who have joined ODP, according to our sources, have paid six figures to join. It’s a lot of money, but probably not more than the salaries of a few Hadoop developers salaries added together. Cloudera, for its part, said that it is taking the money it would have spent to join ODP and is contributing it to the ASF.

That being said, the ODP and ASF aren’t necessarily at odds with each other.

Enough Hadoop Love to Go Around

What Olson doesn’t say in his blog post is that ODP will sit a layer above the Apache projects and that there is no reason to believe that the individuals who contribute code to Hadoop and the projects which surround it will stop doing so.

In fact Hortonworks, the Hadoop distro provider which has signed on to ODP, writes almost all of its code within the Apache Foundation, is not in the business of selling software, and its business model is wholly reliant on providing services and expertise around Hadoop’s core and its ancillary components. Its pitch to enterprises that use, or want to use Hadoop, is pretty simple -- “We can support you better than anyone else because we wrote the code.”

MapR is a software company, plain and simple. That’s what it sells. Support and services are a small part of its revenue model, according to Jack Norris, its Chief Marketing Officer. It and Hortonworks earn their proceeds in almost exactly opposite ways.

And Cloudera is a mix of the two. Its model worked much the way as Hortonworks does until they decided that it was “unsustainable." Here’s part of what Amr Awadallah, CTO and co-founder of Cloudera told CBRonline about Hortonworks last September:

They don't have a defensible business model. They're not doing okay now. And while they might be gaining customers and revenues, to create a healthy business you have to always contrast the revenue to how much is the cost of getting that revenue."

It’s also interesting that while Cloudera’s Olson was championing the “Apache open source way," earlier this week, his CTO bragged about the company’s proprietary software in the CBRonline interview.

"We have proprietary software that's only unique to us, Cloudera Manager and Cloudera Navigator. Hortonworks doesn't have anything like that software, which is a big disadvantage," said Awadallah.

So suffice it to say that Cloudera’s waving of the open source flag is a bit confusing.

Cloudera also just happened to announce, on the same day that Pivotal made its ODP news, that its revenue was $100 million making it the second largest open source company in history.

Since companies that aren’t public aren’t required to disclose their earnings, the timing of the announcement is odd, noted John Furrier, CEO of the SiliconAngle network.

Why All the Fuss?

So, the big question is what’s all the fuss all about? Fourteen (or 15, one is unnamed as of yet) companies have joined together to create an open data platform initiative; two of the major Hadoop providers want nothing to do with it; Pivotal is open-sourcing its MPP database Greenplum, GemFire and HAWQ which have cost them millions to acquire and build upon; Hortonworks is still working day and night on Hadoop; MapR has an impressive new distro, and Cloudera is making a lot of money.

There’s not a single Apache committer that we can find who says they’ll quit contributing code to Hadoop-related projects, so why should anyone be upset?

After all, Hadoop doesn’t actually seem to need any saving and ODP will either go down in history as something that helped advance the adoption of big data in the Enterprise (as its proponents predict) or it will fade away.

Time will tell.

Note: We’re speaking to analysts and select community members to get their take on the matter, look to see more on this next week. We’ll also be publishing a Big Data Bits edition of conference news.
 

Creative Commons Creative Commons Attribution-Share Alike 2.0 Generic License Title image by  Infomastern