It’s probably coincidence that Hadoop pioneer Cloudera and the Linux Foundation each announced Hadoop-related news that isn’t necessarily complementary, on the same day.
But then again, the Strata + Hadoop World is taking over much of the Javits Center in New York City today through Thursday, so if you’ve got something big to say, there are plenty of open, interested ears to hear it.
So it goes to follow that yesterday Hadoop pioneer Cloudera made big news around Kudu, a new columnar store for Hadoop that enables the combination of fast analytics on fast data.
Disruptor or Disruptive?
It could, in time, prove disruptive (in a good way) to enterprises using Apache Hadoop, giving them a competitive advantage. It could also have the unfortunate effect of disrupting enterprises that already view Hadoop as too complicated, too hard to work with and too lacking in standards.
The latter is a problem the Open Data Platform Initiative (ODPi) hopes to put to rest.
On Monday the Linux Foundation announced that it would host ODPi’s formal governance structure as a Collaborative Project.
The Foundation also noted that the number of vendors supporting the initiative has nearly doubled and now includes Altiscale, Ampool, Capgemini, CenturyLink, DataTorrent, EMC, GE, Hortonworks, IBM, Infosys, Linaro, NEC, Pivotal, PLDT, SAS Institute Inc, Splunk, Squid Solutions, SyncSort, Telstra, Teradata, Toshiba, UNIFi, VMware, WANdisco, Xiilab, zData and Zettaset.
These companies have agreed on a tested reference core of open source Apache Hadoop, Apache Ambari and related Apache source artifacts, which promises to provide a common reference platform and set of technologies around Hadoop on which solutions can be built.
The idea is to ease and quiet anxieties around Hadoop adoption.
The Loners: Cloudera, MapR
Cloudera and another Hadoop vendor, MapR, haven’t joined ODPi.
Unless ODPi members and other solution providers around Hadoop adopt Kudu, it could cause further fragmentation in the ecosystem. That being said, Cloudera plans to open source Kudu for the good of the community, meaning that both MapR and competitor Hortonworks can weave it into platforms.
By Cloudera’s own admission, there’s plenty of work yet to be done around Kudu. So while some of its oldest and tech-savvy customers have used it, it’s not yet ready for prime time in the enterprise. That could take as long as a few years.
Analysts, Vendors Talk Kudu
We asked a number of analysts and vendors around Hadoop what they thought of the Kudu news and how it affected the greater Hadoop marketplace. Here are their reactions.
Mike Maciag, COO of Altiscale:"At this point, Hadoop is better understood as an ecosystem of inter-dependent projects, which include core Hadoop itself, Hive, Spark, HBase and others.This ecosystem is rapidly being tasked with a broad range of enterprise data challenges. This is exciting, and it also demands rapid evolution of the ecosystem.
"In this case, Cloudera is filling in a perceived gap within the broader ecosystem.There is a tendency to pit these projects against each other, when they are in fact complementary. MapReduce and Spark, for example, actually work together to address different problems. Kudu looks like it was created to fill a perceived market need. If it is successful, it may well in fact drive growth for the broader Hadoop ecosystem as a whole.
"You see that with the ODPi. It was very controversial, and now it’s growing rapidly.There is a clear market need.
"You see the same thing with Hadoop in general. The big data challenge is real, enterprises need an answer, and the Hadoop ecosystem — messy and noisy as it is — provides it. I look forward to seeing how Kudu does now that it is launched into the open source ecosystem."
Holger Mueller, Analyst, Constellation Research:"The Cloudera Kudu announcement does not really come as a surprise. Hadoop itself is now 10 plus years old and its storage layer is showing some aging.
"At the same time the leading Hadoop distributors need to build value and add applications that allow them to create revenue streams, not surprisingly that will come at the price of splitting the common Hadoop platform.
"Analytics is such a powerful use case that enterprises may go for committing to a Hadoop distribution, as long as it guarantees the response times that allow them to create the insights they want and need faster, more simulated and of higher quality.
"And additionally Cloudera is owned partially by Intel, so when Cloudera wants to tackle the faster storage layer for Hadoop, as it has with Kudu, there will be an Intel hardware wrinkle to the new offerings, as we have seen. No surprise — something to be expected; now it comes back to if Cloudera (and the other vendors) can create enough value for enterprises to 'forget' about standards."
Steve Wooledge, vice president, product marketing, MapR:"MapR realized from day one that Hadoop’s success in the enterprise would be accelerated by an innovative, industrial-strength foundation for storage and database services.
"That’s why we built the MapR Data Platform as the underpinnings of our distribution for Hadoop, which exposes both file system (MapR-FS) and database (MapR-DB) services, which are dramatically faster and more reliable than HDFS and HBase.
"We’re glad to see Cloudera validating (6 years later) and now copying a small part of the MapR strategy.
"MapR provides significantly higher performance already today, and we expect that lead to only grow, along with our lead in enterprise-grade features such as multi-data center table replication, point-in-time consistent snapshots, mirroring, and multi-tenancy.
"There is no need to wait for Kudu. MapR Community Edition is available now for free, including MapR-DB, for anyone who wants to see why so many companies are choosing MapR for their Hadoop and Spark distribution. With Spark and now Kudu, Hadoop is largely a historical label.”
Shaun Connolly, VP Corporate Strategy, Hortonworks:"With Kudu, Cloudera will need to answer questions if they are abandoning HDFS and HBase, and if they are competing with their partners.
Our strategy has always been to collaborate with our partners, rather than compete with them, and we remain consistent in how we invest in Apache projects.”
Big Data Smart(er)
According to Matt Brandwein, director of Product Marketing at Cloudera, Kudu represents an important moment in big data history.
“We’ve made a big bet on Spark and a big investment in Kudu,” he said, noting that the latter is not meant to replace HDFS or HBase and that Spark won’t, in every case, be leveraged in place of MapReduce.
But still, you have to wonder if enterprises will go ODPi or Spark and Kudu. While some will say it’s a ridiculous question and that Kudu hasn’t even been given a chance, consider this tweet by Gartner analyst Nick Heudecker.
With Spark and now Kudu, "Hadoop" is largely a historical label.— Nick Heudecker (@nheudecker) September 28, 2015
As with most things, time will tell.