For Hadoop's 7th Birthday: Hortonworks + inMobi Bring Falcon Data Lifecycle Management to Apache Hadoop

3 minute read
Virginia Backaitis avatar

Enterprises are hell-bent on managing data -- and managing data where there is no inherent data lifecycle management framework to leverage is a bear. A bear that most Enterprises prefer not to wrestle with.

Realizing that this might slow the adoption of Apache Hadoop in the commercial Enterprise space, a team of engineers from Hortonworks and performance based mobile ad network inMobi built Falcon, a data lifecycle management framework for Apache Hadoop. It enables users to configure, manage and orchestrate data motion, disaster recovery and data retention workflows in support of business continuity and data governance use cases.

The framework’s nascent stage was built at inMobi; the company had a need that Hadoop didn't handle, so the engineers there built one for their own use. As is frequently done in Open Source communities, inMobi engineers shared what they had built with the members of the Apache Hadoop community and a team from Hortonworks signed on to make Falcon more broadly useful so that it could be submitted for acceptance into the Apache Incubator.


A Birthday Present For Hadoop’s Seventh Birthday

On Tuesday March 26, just a few days before Hadoop’s seventh birthday, Srikanth Sundarrajan, a Principal Architect at inMobi, announced that Falcon was accepted into the Apache Incubator. One week later Hortonworks introduced it to the Hadoop community at large.

Falcon offers a proven, well-tested and extremely scalable data management system built specifically for the unique capabilities of Hadoop, says Shawn Connolly, vice president of corporate strategy at Hortonworks. It tracks data as it’s brought in, showing how it’s transformed and how it is shared.

Learning Opportunities

Up until now enterprises have tried to build their own solutions to do what Falcon achieves, but they have found it tricky to develop, difficult to test and error-prone.

With Falcon, those days are gone -- the data processing pipeline and all replication points are expressed in a single configuration file and well-tested Falcon services are used to ensure that data is processed and replicated reliably.

Not only that, but it also addresses the business continuity and data governance needs of the mainstream enterprise, according to Connolly. With those needs being met, Hadoop adoption becomes a less daunting task.

Hats Off to inMobi and Hortonworks Engineers

The Falcon project is an example of Apache community members behaving at their best, holding true to the belief that building and sharing together will yield better results than going it alone. And in a Hadoop marketplace that is becoming more and more competitive and quickly overcrowded, sharing for the benefit of all is nice to see. Perhaps it will set an example for vendors whose solutions around Hadoop are becoming more and more proprietary and who promise to contribute “some code” rather than sharing it all.