Big data crusher Hadoop turned 10 earlier this year and it seems like the birthday party Cloudera is throwing for it may never end.
It even has its own hashtag — #Hadoop10.
Hadoop co-creator Doug Cutting, who works for Cloudera, started the celebration with a blog post last January. His boss, Cloudera Chief Strategy Officer Mike Olson, posted about “Hadoop at 20” last month, and yesterday at the Strata + Hadoop World Conference in San Jose, Calif., Cutting was at it again, telling the story of how #Hadoop10 came to be.
A Community Effort
It’s an interesting retrospective except that it leaves out the many, many names and faces of fellow Apache-community members who made Hadoop what it is today.
A more inspiring way to celebrate might be to give a shout out to everyone on the project like Hadoop committer Arun Murthy did when Hadoop 2 was released. “If you’re in this room and you used Hadoop four to five years ago, I can’t thank you enough,” he said to the attendees of the Hadoop Summit in 2014.
So Happy birthday Hadoop and thank you to Doug Cutting and the greater Hadoop community. Now let’s stop talking about it.
We took that sentiment away from a tweet by Constellation Research analyst Doug Henschen
What’s been notable about Strata + Hadoop World thus far is that Hadoop no longer seems to be the star. While Apache Spark, which was the big hit last year, is still hot, Apache Kafka, and its commercial sponsor Confluent is under the spotlight now.
Get Hip with Streaming
Apache Kafka, according to Confluent co-founder and Kafka co-creator Neha Narkhede is a distributed system for streaming data. “It collects data at scale and makes it available in real time. It also has a processing engine so that you can derive value,” she told CMSWire. Or, in other words, to use an analogy Narkhede offered, Kafka is like a central nervous system for data.
LinkedIn, where Kafka was born, uses it to manage 1.3 billion events each day, so that you’re informed when someone “likes” what you posted in real time. Uber leverages it to match drivers with riders, retailers use it to provide alerts when stock is running low, Wall Street uses it for stock data, and then there’s IoT which is all about streams.
While Kafka is easy to get started with, it’s more complex once it’s operational. Developers and DevOps pros need metrics, best practices, and such to put it to good use. That’s what they came to Strata+ Hadoop World to learn more about, but they won’t have to wait until the next conference to learn more. Confluent announced Confluent University this week (the courses take days, not weeks) as well as a Confluent partner program.
Ian Andrews, Pivotal’s vice president of products, took to the stage yesterday to ask if the market was so saturated (we called it crowded in our review of Gartner’s MQ for BI and Analytics) with BI tools that there were no new end users to be won, suggesting that the once hot BI market was not as hot as many thought.
He cited the slide in Tableau’s share price and the rumored sale of Qlik (another tech company, like Citrix and EMC, that’s fallen prey to Elliott Management’s whims) as evidence. Using the phenomena of “peak beard” as fodder (beards are apparently sexy until too many men have them, then they are not), Andrews made a case for delivering insights in context within an app, not a BI tool.
Better BI, Familiar Interface?
Andrews might have been on to something with respect to some of Strata + Hadoop World product announcements CMSWire was briefed on.
Big data discovery platform provider Platfora, for example, has opened itself up so that business users can not only leverage its brains and its brawn but also work with the output via their favorite hip tools like Tableau.
“We want to help business users realize the value of Platfora’s power,” Peter Schlampp told CMSWire, so instead of waiting for waiting for BI users to come calling, Platfora is coming to them. Platfora’s 5.2 release comes with native Tableau Integration; Platfora Lens-Accelerated SQL which provides access to petabyte-scale data at orders of magnitude faster than querying the data directly; accessibility via any BI tool and more.
Analytics on Hadoop and Spark at AtScale
AtScale announced something similar, though it was designed to sit between BI and Hadoop from the very start.
“You don’t need to buy a new BI tool to glean insight from data stored in Hadoop,” AtScale CEO Dave Mariani told CMSWire last year, claiming that some enterprises already use as many as 55 different BI solutions which is about “50 too many,” he said.
At the conference this week, AtScale announced its AtScale Intelligence Platform 4.0 which includes a new patent-pending innovation: the industry’s first Hybrid Query Service for BI on Hadoop which is supposed to make it easier for enterprises to query Hadoop at top speed, from any BI tool in MDX and SQL mode, natively.
It’s All About Speed
And while ease is nice, is speed better?
Intelligence and speed have always been seen as tradeoffs, but not anymore, at least that’s what database vendor MemSQL came to Strata+ Hadoop World to say.
The promise of its latest product release MemSQL 5 is to capture and query data at the same time for real time analytics. The idea being that by leveraging database, data warehouse and data streaming at once and running analytical and transactional processing at the same time businesses can gain unprecedented competitive advantage.
There’s no argument here. The question is how it might compare to SAP Hana Vora or other products that promise to do the same thing.