Cloudera Unveils Big Data Search
Shhh ... don’t tell anyone, but many CIO’s say they’re afraid of Big Data.

Not of the data itself, be it dozens of terabytes or petabytes, or Big Data’s three V’s (high volume, high velocity and/or high variety), but of their ability to deliver on Big Data’s big promises.

“Big Data Is The New ______," you fill in the blank: a) Oil, b) Gold, c) Competitive Advantage, d) Information Frontier … we could keep going, but so could your seven-year-old and your grandmother. That’s how much hype there is about the technology.

But that’s not the point. The point is that Information Workers, throughout the Enterprise, are chomping at the bit, demanding that they be given access to Big Data in short order. They are certain that it’s full of the game-changing insights which they need to leverage to win. But there’s a problem. Most CIO’s have trouble working with Big Data themselves and they have little, or nothing, to offer to business users. "Big Data technologies are simply too hard to use,” they say in private. Going public with such a statement could endanger their careers because that answer is unacceptable to Management.

“I don’t care how you do it, just get me the results,” is a CEO’s common cry. It’s no wonder many CIOs are sitting behind closed doors in pools of sweat.

Well beginning today they can open their doors and stop sweating.

Introducing Cloudera Search

At the Economist’s Information Forum in San Francisco, Mike Olson, CEO of Cloudera, is unveiling Cloudera Search, the Big Data industry’s first fully integrated search engine for interactive exploration of data stored in the Hadoop Distributed File System (HDFS) and Apache HBase. Cloudera Search is powered by the industry’s leading open source search engine, Apache Solr.

What’s the big deal about that? You don’t need any special training to use it. With Cloudera Search almost any Enterprise user who can Google can perform interactive, natural language keyword searches and faceted navigation on data stored in Hadoop, without additional training or advanced programming knowledge.

And while this is simple to say, it’s truly revolutionary. Consider this: up until recently there were only about 100,000 engineers who were capable of running Hadoop and HBase searches using MapReduce -- this according to Charles Zedlewski, VP of Products at Cloudera. As recently as yesterday, only 1 million specially skilled enterprise pros had the skills needed to run simultaneous batch and interactive Big Data searches.

But with today’s announcement, Hadoop and HBase searching will become something billions of Enterprise users can do, no special training required, says Zedlweski.

All Data on One Platform 

Cloudera developed Cloudera Search specifically to address a rapidly emerging need as Enterprise Hadoop deployments become the primary repositories for more and more kinds of data. In order to gain wide user acceptance and adoption, the company realized that it had to make it easier to more quickly combine and refine data into a single, integrated platform.

At its core, Cloudera Search incorporates Apache Solr and other search-related open source projects to support a comprehensive big data infrastructure, and to alleviate the significant costs of maintaining the disparate systems that many enterprises currently depend on to execute search queries.

Cloudera says that Cloudera Search provides enterprises with scalable indexing options for Big Data and extends the Apache Solr project to offer near real-time document processing and indexing of data in transit to Hadoop and other storage endpoints. Data is immediately available to Search and other Hadoop computing frameworks, like Apache Hive and Cloudera Impala. Cloudera Search also provides linearly scalable batch indexing for large data stores within Hadoop on-demand, and with the introduction of an innovative GoLive feature can now incorporate incremental index changes, while avoiding costly downtime.

Cloudera’s press release states that these are Cloudera Search’s key features:

  • Scalable, Reliable Index Storage in HDFS: integrates index storage and serving directly into HDFS.
  • Batch Indexing via MapReduce: allows for index creation of data stored in HDFS and HBase as scalable and robust as MapReduce.
  • Real-time Indexing at Collection: makes an event searchable as it is stored into Hadoop through near real-time indexing features powered by Apache Flume.
  • Easy Interaction and Data Exploration via Cloudera Hue: provides a plug-in application for Hue and easy-to-install capabilities for standard Hue servers to query data and view result files, and enables faceted exploration.
  • Simplified Field Extraction and Cross-Platform Data Processing: allows for quick and easy field extraction of any data that is stored into HDFS using optimized Hadoop file formats, such as Apache Avro, avoiding the pain that many standalone search solutions might impose, and promotes reusable configurations and processing activities with the new processing framework, Cloudera Morphlines.
  • Unified Management and Monitoring with Cloudera Manager: provides a centralized management and monitoring experience that makes it as easy to deploy, configure and monitor search services as it is to manage CDH deployments and other services on the Hadoop cluster.
  • Unified Access and Control through Cloudera Navigator: facilitates better tracing and control of what information has been accessed by whom, allowing for audit search queries and result lists Index management and optimization.

Everyone Benefits From Cloudera Search

Is Cloudera Search a game changer? We answer that question with an easy YES. Enterprises who adopt it will be able to provide business users with self-service access to Big Data insights. CIOs who have set up their Big Data Infrastructures will be recognized for delivering strategic, game-changing solutions. And Big Data vendors will have an easier time selling their solutions because business users are more likely to spend their precious dollars when they know that can see results in short order. And it’s worth noting that Cloudera Search is based on Open Source technology which means it costs less. That’s music to everyone’s (save proprietary software vendors’) ears.

Title image courtesy of lavitrei (Shutterstock)