When Gartner speaks, everyone listens. And when Big Data is the topic, they listen with both ears.
Marv Adrian, a Gartner analyst, keynoted at the Hadoop Summit earlier this week. He confirmed what we already knew, that Hadoop has come a long way, but that it still has a ways to go. It’s not pervasive in Enterprises quite yet.
Adrian discussed the preliminary results of a Gartner survey which reveals that 65% of the surveyed companies said that they have already invested in Big Data or plan to do so in the next two years. (Specific breakdown: 30% have already made investment, 19% plan to in the next year, and 15% in the next two years.)
A somewhat alarming number (31%) said that they have no plans to invest in Big Data at all, and five per cent simply don’t know.
Though an audience of vendors at the Hadoop Summit heard the 65% and saw a willing market and plenty of dollar signs, Adrian issued a word of warning -- just because a CIO has Big Data on its wish list, it doesn’t mean he/she will get the money to do it.
“There’s something about predictions,” he said. “You have to remember that a CTO’s budget is a little bit like a kid’s letter to Santa Claus. Sometimes they get what they want, and sometimes they don’t.”
Culture Clash: The Suits & the Hoodies
Adrian also articulated the culture clash between the Suits and the Hoodies, both on the Enterprise and on the vendor side. It’s not hard to picture the executive with the multi-million budget who promises the business ground-breaking business results from the discoveries that shedding the light on dark data can reveal. But when the business asks exactly what these results are going to look like, how much the bottom line is going to grow as a result, and so on … the truth may be that he doesn’t know, that you can’t predict what’s going to be discovered.
The ROI, a hoodie will tell you, is unknown, and for him/her that’s exciting, they’ll hack away until they find something of interest. And, by the way, though the hoodie might seem kind of rogue and into Open Source which still makes the business nervous, he/she is also smart, understands where the dangers of Open Source are and where they (more than likely) are not, and that they’re there to help the business make money, save money, innovate new products, be smarter and so on.
Adrian says these two cultures must come together to show that Big Data will produce big wins. In my mind, this is already happening -- today’s suits aren’t as afraid or as uncomfortable with geeks as others may think; in fact, to many of the baby boomers that remain in the workforce, the GenY’s are living their teenage dreams. But that’s an aside…
The Gartner analyst also said that Hadoop still had a ways to go in areas such as Search, Security and Governance and that there were questions as to which way the market will go with Big Data Processing -- to the cluster, the cloud or use an appliance. (We see vendors hedging their bets and offering all three.) He also cautions that when Silicon Valley says that something is “ready” it means it’s in beta vs. that real people are actually using it.
With that being said, some products may actually be ready, but we haven’t done any research to establish that. Here are some of this week’s announcements.
Map-R and Fusion-io Accelerate HBase Performance
MapR, which markets and provides services around a Hadoop distribution bearing the same name, announced breakthrough performance with Fusion-io for NoSQL and Hadoop. They say that the combined solution accelerated performance 25 times faster for read intensive Apache HBase applications, which needless to say, is a win for users of Solid State Drives (SSDs) due to software inefficiencies.
Tomer Shiran, vice president of product management, MapR Technologies talked-up the company’s newest distribution as well, “Our new M7 Edition supports native tables, allowing HBase applications to benefit from SSDs -- thereby achieving the 25x performance advantage compared to HBase in other Hadoop distributions,” he noted.
WANdisco’s 100% Guarantee - Hadoop Remains Available Even If Data Center Goes Down
WANdisco, a provider of high-availability software for global enterprises to meet the challenges of Big Data and distributed software development, announced its Non-Stop NameNode WAN Edition, bringing to the wide area network the same unique patented NameNode replication available over the LAN.
The Non-Stop NameNode WAN Edition applies WANdisco's patented replication technology to deliver 100% uptime by eliminating Hadoop's most problematic single point of failure -- the NameNode -- to provide the first and only continuous availability solution for globally distributed Hadoop deployments. With Non-Stop NameNode WAN Edition, if a single NameNode server or an entire data center goes offline, Hadoop is still available.
This solution, according to the company, is available now.
The company also announced that it will Open Source its S3-enabled HDFS option for Hadoop, which provides enterprises with a complete and seamless migration solution from Amazon's public cloud to a secure private cloud.
This will be available by the end of the summer.
DataTorrent Makes Toying With Streaming Data Available To Explorers
Hadoop already handles the exploration of batch data, so DataTorrent has decided to do something different. The company has just built a Hadoop platform for analysis and alerts of streaming data from text, email and other methods, so companies can make decisions on real time data in real time.
The idea here is that advertisers will have a way to serve ads which yield clicks in real time, versus wasting time floating the same irrelevant ads at you for hours. DataTorrent reportedly has a library of more than 250 operators that the company is making available through an open-source Apache Software Foundation license.
Note: DataTorrent is available only in developer and evaluation versions at present.
Title image courtesy of Brian A Jackson (Shutterstock)