It’s hard to believe that only three years ago “big data” seemed like a strange term. I remember sitting in a crowded room at GigaOm’s first New York Structure Conference at Chelsea Piers listening to a bunch of bigwigs debate what the massive amount of data we were accumulating quickly would eventually be called.
“I hope it’s not big data,” one of them said -- I think it may have been Om Malik. It seemed as if the term was being used “temporarily” until someone came up with something better.
Needless to say, it stuck.
Later that year at the O’Reilly Strata Conference in New York most of the audience was curious about what the big data buzz was all about; there wasn’t much talk of it at work. O’Reilly’s Maureen Jennings and I reflected on it last fall. “I think there was more press there than attendees at that conference,” she said. She was kind of kidding. But it was amazing to both of us that there’s so much interest in big data right now, that the conference will be held at the Javits Center next year (where things like the New York Auto Show are held). And I bet the exhibit hall will be jam-packed with vendors.
So it goes to follow that this week Big Data Bits will be written in two segments. Though we can’t include everyone, here’s what we found to be notable:
EMC’s ViPR Gets Busy with HDFS and the Download is Free
EMC is keen on leading the world into the third era of computing and part of that plan includes ViPR, a software defined storage layer which provides an interface to information wherever it is stored, even on competitor products like NetApp.
ViPR now includes the ViPR HDFS Data Service which is a Hadoop compatible file system that enables customers to use their existing storage infrastructure as a big data Repository. The company says that it gives organizations the ability to run analytics using well known industry Hadoop distributions on existing data stored across heterogeneous systems such as VNX, Isilon and NetApp arrays and, in 2014, commodity storage.
Whoa, EMC said commodity.
And it’s not only that that’s new. The ViPR download is free (NOTE: it’s intended for non-production purposes), but that’s not all. The company promises that its salespeople won’t be nagging you to buy after the download. Chad Sakac, the company’s Senior Vice President of systems engineering wrote:
And, get this … When you download, we leave you alone :-) Yes, we note that your emc.com account downloaded the stuff, but it gets routed to the inside SE team (not the inside sales team), so the follow-up is 'hey, did you get it working alright,' not 'can I sell you something!'"
And Syncplicity users and soon-to-be customers take note, I predict that ViPR and Syncplicity will have a play.
Cloudera Announces Commercial Support for Spark (Better than MapReduce?)
We’ve already told you that Cloudera is claiming its stakes on the Enterprise Data Hub. They have also announced commercial support for Apache Spark, a lightning fast machine learning and processing environment that is said to be up to 100 times faster and require writing two to 10 times less code than equivalent MapReduce applications.
They’ve indicated that support will follow for Cloudera Enterprise 5 and Spark on YARN in the near future.
Big Deal or Not: Oracle Announces BigDataLite
Oracle has announced BigDataLite, a virtual machine for Oracle’s Big Data platform. It promises to let developers write apps at their desks and to then send them straight to the appliance. From what we can tell, the hope is that developers will get busy with it by “dipping their toes in the water,” developing a few proof of concept solutions and then coaxing the enterprises that they work for to buy bigger, more powerful, more expensive products from Oracle.
The company writes that BigDataLite includes: Oracle Database 12c Enterprise Edition, Oracle Advanced Analytics, Oracle NoSQL Database, Cloudera Distribution including Apache Hadoop, Oracle Data Integrator 12c, Oracle Big Data Connectors and more.
For now, it’s available for non-commercial use only.
Do would be competitors feel threatened? “We don’t know much about it,” was the general consensus.
We asked Teradata if this was an earth-shattering announcement or if they had something similar. They told us about their own Aster Big Analytics Appliance, which tightly integrates Aster Discovery Platform, SQL-MapReduce® and Apache Hadoop. It was launched in Q4 2012. At the same time they introduced a virtual machine-based version of Aster Discovery Platform named Aster Express. Aster Express has been available since Q4 2012 as a free download for developers, customers, partners and prospects to test drive Aster. Applications can be developed on the desktop and then moved straight to the appliance.
Is Oracle playing big data catch up like it is (according to many) with the cloud? Probably. But maybe Oracle loyals, who are drunk on Larry’s Kool-Aid, don’t know or don’t mind.
Hopefully, for Oracle’s sake, developers will learn that BigDataLite exists (almost no one has written about it, other than Oracle) and give it a spin, because there’s almost no mention of it (other from Oracle employees) on the web.
Pivotal and LucidWorks Hook Up
The promise of big data is to provide the deep insights that businesses need to make unprecedentedly smart decisions at the right time (which usually means in short order). In most cases this means leveraging smart algorithms, data crunching and search.
LucidWorks, which uses Apache Solr to provide embedded search capabilities, has partnered with Pivotal to bring a more scalable and powerful search capability to PivotalHD (Pivotal’s Hadoop distro).
The companies say that the comprehensive solution enables customers to reduce the time to implement and offer enterprise search capabilities to customers. It goes to follow that end users will be able to decrease the time it takes to gain insights from data and therefore be able to focus more time on building growth and profitability in their core businesses.