Cloudera watchers would have to be blind to think that the company is only about Hadoop. After all, it not only has over 700 partners that add value to its software, but it has also leveraged a good number of Apache open source technologies to help its customers ask bigger questions and gain more value from their data.
So it goes to follow that this morning Cloudera will reveal its strategy to become a “data hub” from which Enterprises can leverage all of their data -- structured, semi-structured or unstructured -- from wherever it sits -- data warehouses, applications or legacy systems.
We are becoming a data management company,” explained Matt Brandwein, the company's director of product marketing during a pre-announcement interview. “We want you to be able to store all of your data in one place, to be able to use your existing tools, and to help bring more benefits to your customers faster.”
It’s hard to overstate how significant this announcement is -- Cloudera is proclaiming that it wants to manage all of your Enterprise data, every little bit of it. “Even transactional data?” We had to ask. Brandwein’s simple answer, “Stay tuned.”
Cloudera Separates Itself from the Pack of Other Hadoop Distro Providers
Listening to Brandwein, it’s clear that Cloudera wants to separate itself from the pack of other Hadoop distro providers, whether we’re talking Amazon, Hortonworks, Intel, MapR, Microsoft, Pivotal or WanDisco, among many others. And it doesn't want to do this by playing a better game; from this point forward, Cloudera no longer considers these companies competitors.
That being said, it should be noted that Cloudera is by no means abandoning Hadoop or even taking the smallest step away. Just yesterday, the company announced two brand new initiatives to its Cloudera Connect program, which centers around Cloudera’s distribution including Apache Hadoop (aka CDH), its open source distribution of Apache Hadoop.
Today Cloudera Unveils Its 5th Generation Platform for Big Data
Cloudera’s new vision should not overshadow CDH 5, the fifth generation of its Platform for Big Data, Cloudera Enterprise, which is now available as a public beta offering. According to Cloudera, the new release not only builds upon Apache Hadoop 2, but it also offers unique features and advancements that simplify storing, processing, analyzing and managing large structured and unstructured datasets, while offering increased security, robust data management and tight integration with third-party applications.
Key advancements in the new Cloudera Enterprise 5 release provide:
- In-Memory HDFS Caching: Datasets from HDFS can now be cached in-memory, boosting MapReduce data processing performance and Cloudera Impala’s analytic query response times for even faster time to insight.
- User-Defined Functions (UDFs): Customers can now use the custom query functions they depend on in conjunction with Cloudera Impala to deliver the business insights they require. They can also take advantage of the popular open source MADlib library of pre-built statistical and analytic functions to enable scalable in-database analytics.
- Resource Management: Cloudera Enterprise now delivers advanced resource management for running multiple frameworks for data processing and analysis on a single cluster through the powerful combination of Hadoop YARN (Yet Another Resource Negotiator) and Cloudera Manager. For the first time, administrators can allocate resources not only by workload, but by workgroup, ensuring the best combination of performance and utilization. For example, customers can dedicate 50 percent of capacity for IT to run mission critical data processing jobs, 30 percent to the marketing team for ad-hoc BI queries, and so on.
- Unified Management of Third Party Applications: Cloudera Manager now provides extensibility to enable customers to deploy, manage and monitor products from Cloudera partners such as SAS, Revolution Analytics, Syncsort and many more. Now, customers can manage complex clustered environments from within a single, intuitive interface.
Comprehensive Data Management
In addition to enabling centralized data auditing for Hadoop, Cloudera Navigator now provides:
- Data Discovery: Analysts and data modelers can search, explore, define, and tag datasets through the Cloudera Navigator interface, to help identify relevant information for downstream analysis or processing.
- Data Lineage: As the amount of data in Cloudera Enterprise grows, so does the importance of understanding how that data is used across the organization. Cloudera Navigator delivers the industry’s first data lineage solution for Hadoop, enabling customers to meet regulatory requirements, find associated datasets and satisfy data governance and retention policies.
- Data Protection: HDFS and HBase now support snapshots to help prevent data loss.
- NFS-based Data and Application Access: Easily integrate Cloudera Enterprise with data in and applications running on existing file systems with native support for NFSv3.
Will Anyone Play “Anything You Can Do, I Can Do Better,” One Last Time?
There’s no question that other Hadoop distro providers will be looking to discover just how much value Cloudera has added to Apache Hadoop 2.0. It’s also likely that Cloudera won’t care what they think. First because, as Brandwein puts it, “we focus on delivering what our customers need,” versus focusing on what our would-be competitors are doing.
And second, because Cloudera has changed its game, the company whose primary focus is building or providing services and support around Hadoop is no longer the competition.