here to stay

Last year at the Strata + Hadoop World in New York City Cloudera co-founder and CTO Mike Olson predicted that Hadoop would disappear and that only one lonely worker in the data center world even know it was there.

He was right on the money, Matt Brandwein, director of Product Marketing at Cloudera told me in a recent conversation, “We succeeded by making Hadoop as invisible as possible”.

And invisible is exactly what Hadoop might look like in many enterprises according to a survey conducted by Gartner analysts Merv Adrian and Nick Heudecker

But that’s not because Hadoop is busy humming along, crunching big data somewhere in the background.

Warming Up to Hadoop

What seems to be happening instead is that 82 percent of the companies surveyed have yet to deploy Hadoop. 

Specifically, 30 percent are “piloting and experimenting,” 18 percent are “developing strategies” and another 13 percent are “knowledge gathering.”

About 21 percent had no plans to invest in Hadoop at the time that the survey was taken. And the remaining 4 percent “don’t know” the status on their companies’ thinking around Hadoop adoption.

From where we sit, it seems Hadoop needs to “appear” in more companies before it can “disappear” and become invisible.

In other words, there’s plenty of opportunity for Hadoop vendors and users unless Hadoop is written off as a historical label at the hands of Spark and Kudu, as Heudecker suggested in a tweet.

The Death of Hadoop Is Greatly Exaggerated

One look at the record-setting crowd at Strata + Hadoop World at Javits Center in New York City this week and Heudecker’s prediction hardly seems plausible.

While last year the talk at the conference was all about Spark, this year it’s about the Internet of Things and data-in-motion. That’s a trend that Hortonworks recognized before it acquired Onyara, the creator of and key contributor to Apache NiFi, and began its work on DataFlow.

Though DataFlow is a separate product from HDP (the Hortonworks Data Platform), Hortonworks’ interest in the latter hasn’t waned. Earlier this week the only 100 percent open source provider of Apache Hadoop announced that it has joined forces with ManTech and B23 to deliver advanced cyber security solutions powered by OpenSOC.DEFINE

Microsoft Wants to Bring Big Data to More

Microsoft loves linux
When we think about democratizing big data, we usually think about the front end and pretty UIs, tools that can make business analysts look like data scientists and so on.

Microsoft has certainly done plenty of that. There’s even evidence of the actions it has taken toward its vision of bringing big data to a billion users.

But the not-so-sexy back-end matters as well and the new “open to open source” Microsoft gets this. Yesterday they announced that their Hadoop distro HDInsight will be made available in Azure for Linux.

Making big data accessible without a hassle requires some snazzy engineering and Microsoft seems to be succeeding at that. Later this year Microsoft’s Azure Data Lake Store, previously known as Azure Data Lake, will be made available.

T. K. “Ranga” Rengarajan, Microsoft’s corporate vice president, Data Platform, Cloud & Enterprise, describes it as a “single repository where you can easily capture data of any size, type and speed without forcing changes to your application as data scales.”

In other words, you’ll soon be able to use the store for securely sharing and collaborating on data. It will be accessible for processing and analytics from HDFS applications and tools.

“Forget the infrastructure, focus on the analytics,” that’s the promise of Azure Data Lake Analytics, according to Rengarajan.

If all goes it will unfold via U-SQL, a new query language that unifies the ease of use of SQL with the expressive power of C#. In a perfect world, millions of SQL and .NET developers will be able to process and analyze all of their data with the skills they already have.

Pivotal Comes to Apache Bearing Gifts

EMC Federation Chairman Joe Tucci probably didn’t grow up in a world where sharing was the default behavior.

But that’s what EMC-spinoff Pivotal is talking about at Strata this week. It’s released Pivotal HAWQ —which some once believed would give (and maybe it has, sales haven’t been disclosed by both parties) Cloudera’s Impala a run for its money — to open source under the care of the Apache Foundation. Ditto for MADlib its machine learning library.

“Companies want open source. Enterprises don’t want lock-in,” Michael Cucchi, senior director of outbound product at Pivotal, told us earlier this year. “It has to be open source or the conversation doesn’t begin."

Bravo to Pivotal for seeing the light and opening it up to the community to make it even better.

For More Information:

There’s plenty of news from Strata+ Hadoop still on the way, so keep your eyes on CMSWire. In the meantime, you may enjoy:

Creative Commons Creative Commons Attribution-No Derivative Works 2.0 Generic License  Title image by h.koppdelaney.