Data is continuing to grow inside of organizations. Enterprise content management system, business intelligence and various other forms of structured and unstructured data have left leaders scrambling for solutions to make and find value in the data. Adopted by Internet luminaries, big data framework Hadoop has skyrocketed to popularity; a nice kid that nobody wanted to say anything negative about. However, now that technology is not quite as new and shiny and enterprises are using it for real-live production data, people aren’t being quite so nice.
The Rise of Hadoop
Yahoo released Hadoop as open source only a few years ago. For humans that’s a toddler, but in the software and Internet years, it’s reaching a ripe old age. What is Hadoop? Apache Hadoop is a top-level project that includes three sub-projects:
- Hadoop Commons — shared components for the framework
- HDFS — a distributed file system
- MapReduce — A software framework that allows fast parallel processing of huge data sets
Until recently, organizations that wanted to implement Hadoop didn’t have many options. They could elect to use unsupported open-source stack with Apache’s Hadoop distribution — not an attractive option to many risk-averse technology leaders. Alternatively, they could use Cloudera’s (news, site) commercial distribution of Hadoop or Amazon Web Services’ Elastic MapReduce. Cloudera emerged as a leader in the space and its success arguably spurred its own competition.
The Hadoop market has emerged with faster Hadoop big data query. Major industry players ranging from IBM to startups have entered the Hadoop-based technologies and services. How many Hadoop-related solutions exist?
- Apache Hadoop
- Appistry CloudIQ Storage Hadoop Edition
- IBM Distribution of Apache Hadoop
- IBM Global Parallel File System (GPFS)
- Cloudera’s Distribution including Apache Hadoop
- DataStax Brisk
- Amazon Elastic MapReduce
- Pervasive DataRush
- Apache Hive
- Yahoo Pig
Even with all the competition, Cloudera is leading the commercial Hadoop market and has managed to get Hadoop creator Doug Cutting on its employee rolls. With all of the competition and innovation in in the space, it would seem users would be thrilled. Not exactly.
Hadoop is Hard
Hadoop is not the most intuitive and easy-to-use technology. Many of the recent startups that have emerged to challenge Cloudera’s dominance have the exclusive value proposition that they make it easier to get answers from the software by abstracting the functions to higher-level products. But none of the companies has found the magic solution to bring the learning curve to a reasonable level.
Some companies are coming out publicly criticizing Hadoop, but it doesn’t seem that anyone is abandoning it altogether. Despite the recent influx of entrants, the Hadoop market is still young. We are likely to see a lot of consolidation emerge over the next few years and few companies simply eliminated — although it’s not happening fast enough for some.
Have you used Hadoop? Was it a pleasure or pain? Let us know.