Marry a big vision and a vibrant open source community and you’ll get something pretty special. In this case we’re talking about delivery on the Stinger initiative, which teamed engineers from Hadoop distro provider Hortonworks with more than 140 developers to advance interactive SQL querying ability on Apache Hive at scale in pure open source.
The initiative, which was completed in April, brought together over 390,000 lines of code contributed by developers from 44 companies, to provide business analysts and data workers with one powerful engine for SQL queries on big data sets at speed and at petabyte scale.
It does something that other big data solutions like it do not do: it gives users a single, simple tool to use for either interactive or batch processing.
One is Better than Two
“End users want one SQL engine, one tool, not two,” said Jim Walker, director of product marketing at Hortonworks. Other Hadoop vendors leverage a second technology that might complicate things for end users.
There’s little doubt that the Cloudera’s, MapR’s and Pivotal’s of the world see this differently as they all have different approaches to solving the problem.
But given the large number of contributors to Stinger and Hive from not only software company employees but also from consumer companies as diverse as Spotify, Linkedin, Facebook and eBay, there’s little doubt that Walker’s comments resonate. So much so, in fact, that a new initiative Stinger.next is being announced r today.
Hortonworks co-founder and Apache committer Alan Gates and Hortonworks Sr. Product Manager Raj Bains have provided fine-grained detail of the Stinger.next in a blog post. But for those who are interested in an overview, Stinger.next hopes to harness the momentum and enthusiasm built upon the delivery Stinger’s initial goals to bring even more power and more capabilities to Apache Hive.
Stinger.next will push Hive’s performance limits by building capabilities for sub second queries, a more complete set of SQL semantics and transactional capabilities all at petabyte scale.
It has three primary goals around speed, scale and SQL. More specifically:
- Speed: Sub-second queries will allow users to deploy Hive for interactive dashboards and explorative analytics that have more demanding response-time requirements.
- Scale: The only SQL interface to Hadoop designed for queries that scale from Terabytes to Petabytes.
- SQL: Enable transactions and SQL: 2011 Analytics for Hive
3 Milestones, 3 Timelines
Leveraging the model used for the successful, on-time provision of the initial Stinger project, Walker said Stinger.next would be delivered at a rapid pace over the next 18 months. Transactions will release in late 2014, sub second queries in the first half of 2015, with a preview in the next few months.
Hive Developer Community Amped-Up
When the initial Stinger project was delivered last April, Walker said the community around it had one simple question: What’s next? Needless to say, that’s where Stinger.next got its name. But as importantly, engineers from companies like Spotify, which has the largest Hadoop cluster in Europe, expressed keen interest in continuing to work with the broad community to move the ball forward and to drive the road map, said Walker.
What Might the Greater Community Think?
While different Hadoop vendors have carved different paths for querying big data with speed and at scale, with Stinger.next Hortonworks does what it does best. It stays true to Apache Hadoop, its ecosystem, and its commitment to true open source.
But that’s not all. It also rallies the community to bring its best to the table and to build technologies and tools that users need and want. And while many other competitors may have the same goals, the number of developers working on their projects (open source or not) is, for the most part, smaller.
Hortonworks always bets big on the idea that an open, open source community will overwhelm anything a single vendor or small group of developers can bring forth. And given the large number of tech vendors who partner with Hortonworks (Microsoft, Tableau, Teradata, and SAP are among them) they may be on to something.
Title image by Patryk Kosmider / Shutterstock.