The exponential growth of data created by social media, location sensors and other Web 2.0-style interactions has driven the emergence and growth of solutions to manage “big data” efficiently. Uber-fast MapReduce-based tools, such as Apache’s Hadoop, have emerged as some of the most popular. Data enthusiasts will be excited that Cloudera (news, site), commercial provider of Hadoop products, services, support and training, has finally announced the general availability of its own Hadoop distribution: Cloudera’s Distribution including Apache Hadoop version 3 (CDH3) -- makes data processing short, but apparently product names, not so much.
Big Data is a Big Deal to Cloudera
If you know big data, then you probably know MapReduce, the basis of Hadoop. Web giants like Google, Yahoo and Orbitz have leveraged the technology to deal with their massive data processing needs. There is something appealing about having access to the exact same technology stack as the big guys for managing content, but that’s what open source data processing frameworks such as Hadoop allow.
As innovative and powerful as Hadoop is for managing big data, for some organizations, it still lacks many of the features that make a viable solution right of the box. This is the issue that Cloudera wanted to address with its Hadoop distribution.
CDH3 has been baking at Cloudera since March 2010 and has already been deployed in production by Cloudera customers for over a year. CDH3 is built on an Apache Hadoop foundation augmented with eight open source tools for activities such as data loading, query language support, job scheduling and workflow. CDH3 includes the components required to leverage the platform in an enterprise environment without time-consuming integration. Users can deploy Cloudera’s distribution and start experiencing the data-crunching goodness that is Hadoop.
Cloudera’s Hadoop distribution joins an increasingly busy field of Hadoop distribution. Amazon has Elastic MapReduce, IBM has Infosphere BigInsights and new players such as Hadapt and DataStax were announced at the recent Structure Big Data Conference.
Why all of the distributions? Hadoop, honestly, is not the easiest thing to use. Organizations could go through the effort of downloading the core Apache Hadoop distribution, configuring it, integrating with other snazzy open source tools like Hive -- but why, when one of the experts can do it for you? Unless there is some core organizational value gained, then it’s probably just geekery for the sake of geekery -- not something most companies are willing to expend capital on in the current economic environment.
The general availability release concludes Cloudera’s extended beta for CDH3. If you want to learn more about the current release, Cloudera is conducting a webinar on Thursday, April 21 at 11 am PDT to discuss the new features in CDH3. CDH3 is available for download now.