When there’s disruption in an industry, bold statements are made. And in an era of Big Data, we seem to be hearing them all the time.
Last week at the Cloudera Summit, we heard another one. Cloudera’s chief executive, Mike Olson, called on Enterprises to “unaccept the status quo” as it relates to data management. He preached the gospel of Hadoop:
For more than thirty years, the data management industry has relied on relational databases, dedicated expensive storage and other very expensive special purpose legacy systems,” he said. “While that approach was very powerful for decades, the accelerating tsunami of data arriving every minute, every hour, every day has begun to overwhelm those systems. Advancements in business analytics have been hindered by the relatively small fraction of the total data made available, and the cost to store, move and process it. Today, the existing standard approach is changing. The center-of-gravity of the enterprise data center is shifting -- it’s moving toward Hadoop.”
And while few doubt that Hadoop can/will make a profound impact on how decisions will be made in the future, just how, how big and how soon is open to question. Vendors like EMC say they are “all in” on Hadoop -- in fact, they are so far in that they've spawned Pivotal, a company whose mission is to build a new data fabric whose starting point is Hadoop.
Other vendors like Teradata, SAP and Microsoft say that Hadoop is important, but not so important that it overshadows other important technologies.
One of the great joys of reporting on technology is that we get to look at the big picture and watch things unfold; we ask questions and make generalized conclusions far more often than we take sides.
In this case, we've asked data management leaders about Hadoop, its significance and its future. Here’s what (comments in alphabetical order by vendor name) each of them have to say with regards to Hadoop and other data management/data warehousing technologies:
Notes: 1) We contacted a few other vendors for their opinions, but they did not respond. 2) We’ll be looking at Big Data databases in a future article which is why they aren't featured here.
Cloudera’s CEO, Mike Olson (from a prepared statement):
Storing and analyzing all (of) one’s data with old guard legacy systems doesn’t make sense economically or technically. Now organizations have a choice. They no longer need to make undereducated, on-the-fly decisions about what data to keep and what to offload, or business decisions with insufficient information.
Hadoop has fundamentally transformed the economics of data management, making it possible to choose to keep all (of) one’s data, without an exorbitant, ongoing investment in a cumbersome technology that can’t keep pace with the growth of data or the evolving needs of a business.
Cloudera (and its Hadoop-related products) is making it possible to store and manage all data -- today -- so organizations can leverage it whenever and however they see fit. This opens up new opportunities for data discovery and insights that were never before possible under the old paradigm. Welcome to the new era of data management.”
Microsoft’s Director of Product Marketing, Server and Tools Business, Herain Oberoi:
Technology will continue to improve, making big data more accessible to more users on the platforms of their choice. These improvements will help more people get actionable insights quickly, conveniently and more economically from their data.
Hadoop is both a compelling solution for analyzing unstructured data at low cost and a critical part of the big data ecosystem, and Microsoft has been working with Apache Hadoop project founding member and most active committer, Hortonworks, to deliver Hadoop based solutions (HDInsight) both on Windows and in the cloud on Windows Azure. The portability, security and simplified deployment of these solutions, as well as their interoperability with Microsoft’s award-winning business intelligence tools, create unique and differentiated value for customers.”
Pivotal’s Chief Scientist, Milind Bhandarkar:
We're seeing rapid adoption with the Hadoop business growing 60-70% a year and we believe it will continue to grow in popularity. Apache Hadoop provides a great foundation for the next-gen data platform and we’ve leveraged it to add a proven, interactive standard SQL query layer: our innovative HAWQ technology. HAWQ is essentially a fully functional, high-performance relational database that runs in Hadoop and speaks SQL natively to deliver performance improvements of 50X to 500X as it helps customers gain insight from different types of data spread across multiple systems.”
SAP’s Director of Big Data, David Jonker
The 21st century demands new approaches to managing data, including the enterprise's data warehouse environment. Increasingly, enterprises will build logical data warehouses that virtualize data access from specialized data stores. At SAP we believe in-memory platforms, such as SAP HANA, will be at the center of the logical data warehouse, with relational databases, Hadoop and other NoSQL data stores acting as repositories and staging environments. While the traditional data warehouse needs modern technology, such as in-memory and columnar databases, rigorous data warehousing practices must continue to ensure the quality of mission-critical data, such as an enterprise's financials. Hadoop is complementary technology especially suited to supporting the work of data scientists.”
Teradata’s VP, Unified Data Architecture Marketing, Steve Wooledge
Teradata agrees that companies should "unaccept the status quo," and this applies across the whole enterprise data architecture, not just Hadoop, and not just data warehousing. This is the focus of our Unified Data Architecture vision, products, and services -- integrating a best-of-breed architecture with Hadoop as a key component.
Hadoop is changing how data management is deployed and it is understandable why companies are excited. As with any new technology, the promise and "hope" of change is alluring, but to feed the frenzy without providing holistic, strategic guidance to customers is dangerous. If Hadoop is a hammer, then every problem looks like a nail and companies will walk away with imprecise architectures, which cannot meet stringent business service level agreements
At Teradata, we work with customers to incorporate Hadoop along with other workload-specific platforms (Teradata and partner technologies) in a seamless analytic environment across data storage, transformation, preparation, analytics, and operationalization in the business. It can be described with the metaphor of 1+1 = 3. No one technology can be optimized for every type of workload or customer use case. It is Teradata’s goal to help its customers to leverage all their data, by the effective deployment of transformational technologies that drive tangible business results."
WANdisco’s CTO and VP Engineering of Big Data, Jagane Sundar
The cost of storing data on Hadoop is orders of magnitude cheaper than any of the alternatives. The only thing preventing Hadoop from becoming the de-facto storage solution for all data is the lack of enterprise grade high availability and disaster recovery solutions. Companies such as WANdisco are focused on addressing these deficiencies. Once WAN-scope continuous availability and disaster recovery solutions are available for Hadoop, its widespread adoption for storage of all data is inevitable.”
It’s clear that all of our contributors agree that Hadoop will play an important role in the future; but how significant, in exactly what way, and how soon remains open to question; we’re not sure when that question will be answered.
After all, for every Netflix whose business and success is practically powered by Hadoop, there’s a consumer products company who went through the price and the pains of implementing it without gaining any insights they were willing to take action on. Those companies will be hesitant to go “all in” on Hadoop, at least for right now.
We’re at the point where the rubber meets the road … the point where Big Data delivers or doesn’t,” says Chris Taylor in a recent blog post on Wired. He adds that “Patience with experimentation will wear thin over the next year or so and there need to be more ‘everyday’ companies taking advantage of Big Data and talking about their successes. Big Data is headed for the Trough (of disillusionment) as long as there are more people trying than succeeding, and that’s where we are right now.”
If Taylor’s last point is true, the “status quo” won’t move forward in a Big Data way all that quickly despite what anyone says. CIO’s will have more of a “wait and see” attitude and their only compelling reason to move to Hadoop will be to save money. Though that’s a compelling reason, it may not outweigh the real or perceived risks of moving to a new technology which many don’t yet see as being “enterprise" ready. And, perhaps more importantly, it doesn't deliver on the transformative promises of Big Data.