The boys at Hortonworks eat, sleep and live Hadoop, and there’s a feeling they’ll never be satisfied. Not because they, and other Apache Hadoop committers, think that the Open Source Big Data Framework isn't already amazing, but because it continues to offer so many possibilities. They see it fueling innovation, powering the next generation of companies, and being the catalyst that well established Enterprises use to reinvent themselves for the data-driven era.
Not only that, but Hortonworks sees itself as the perfect complement to some of today’s most successful and widely used Enterprise software providers, like Microsoft, Teradata, SAP, Splunk, Tableau and many, many others. The aforementioned vendors feel the same way, no doubt, because they’ve integrated the Hortonworks Hadoop distribution with their offerings.
Introducing Hortonworks Data Platform 2.0, Available for All
This morning Hortonworks is announcing the general availability (GA) of Hortonworks Data Platform 2.0 (HDP 2.0), the next evolution of the industry’s only 100 percent open source Hadoop distribution. Shaun Connolly, the company’s Vice President of Corporate Strategy, says that it is the first commercial distribution built on the recent Apache Hadoop 2 general availability release from the Apache Software Foundation.
It’s probably safe to say that Hortonworks is getting its Enterprise distribution out of the gate first because they have so much skin in the Apache project; the company employs more Hadoop committers than anyone else.
Still, Hortonworks co-founder, Arun Murthy, who has been an Apache Hadoop committer since “day one” in 2006, is careful not to give his team any more credit than anyone else. The success of Apache Hadoop 2.0 was a group effort with no member of the team assigned more credit than anyone else.
Make no mistake, HDP 2.0 belongs to Hortoworks, though anyone can download it for free and use its Enterprise version without spending a dime unless they ask for training, support or other services.
Hadoop Innovations on One Integrated, Tested Platform
So what does HDP 2.0 offer that its predecessor does not? Most notably the YARN-based architecture of Hadoop 2, Phase 2 of the Stinger initiative and it includes the very latest innovations from the broader Hadoop ecosystem in a single integrated and tested platform.
For those who aren’t familiar with YARN, what it does, according to Hortonworks, is that it takes Hadoop beyond a single-use data platform for batch processing to a multi-use platform that enables batch, interactive, online and stream processing. By acting as the primary resource manager and mediator of access to data stored in Hadoop Distributed File System (HDFS), YARN enables enterprises to store data in a single place and interact with it in multiple ways simultaneously and with consistent levels of service. In other words, enterprises can now run multiple queries at the same time.
Phase 2 of the Stinger initiative, according to Connolly, improves performance against Apache Hive, a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. It also gives HDP 2.0 the ability to be faster, to work with larger workloads and to leverage SQL windowing functions such as Rank, Lead and Lag, among others.
And finally HDP 2.0 is less “buggy” to work with than the boilerplate version of Hadoop because more than 420 tickets have been resolved.
At the end of the day, what does HDP 2.0 bring to the market? Better performance, fewer challenges and an improved experience; but that’s not all. Enterprises will now be able to gain more value from their data which should result in better products, services and business results.
Want to see it in action without needing to know about how any of it works? Keep your eye on Spotify. The company uses a Hadoop cluster to develop analytics that discover what music to introduce you to on Spotify radio and to predict which ads you might be receptive to.
Spotify recently announced its commitment to HDP, so we can sit back and watch as Hortonworks Data Platform performs against, what is believed to be Europe’s largest commercially used cluster, with 690-nodes, while storing data from Spotify’s more than 24 million active users and six million subscribers.