For anyone who doesn’t yet know, Facebook open sourced “Presto” yesterday. What's Presto? It’s a distributed SQL engine for big data reported to run magnitudes faster than most of today’s widely used technologies that were built to handle such workloads. They include the likes of Apache Open Source Hive, Cloudera’s Open Source Impala, Stinger (supported by Hortonworks) and Pivotal’s Hawq, among many others.
Hats Off To Facebook, Hats Off To Presto
The open source community that builds products and services around big data is no doubt excited, as are the data-driven companies that have worked with it so far.
The website for Presto cites a quote from Fred Wulff, a software engineer at Dropbox:
We're really excited about Presto. We're planning on using it to quickly gain insight about the different ways our users use Dropbox, as well as diagnosing problems they encounter along the way. In our tests so far it's been rock solid and extremely fast when applied to some of our most important ad hoc use cases.”
Words from the Wise
Matt Pfeil, a co-founder of DataStax, which provides Enterprise-grade software and services around Big Data database Apache Cassandra, says he got a call from a friend who works at Instagram this morning saying, “You need to start looking at Presto.”
He is eager to do exactly that, and with good reason -- he thinks it will be interesting. He doesn’t, however, think it’s going to change his company’s vision or products in any way. “We’re not in the analytical space,” he says.
Pfeil’s enthusiasm comes, at least in part, from the fact that Presto was created at Facebook. “Facebook gets to see problems that few companies in the rest of the world experience. It has a real demand and the answer is innovation,” he explains.
We asked Pfeil if he thinks that Presto will impact solutions like Hive and Impala. “It depends,” he says. “The key will be if someone picks it up and builds a community around it. But, even then,” he cautions, “for every open source project that succeeds there are many, many who fail, and it’s not necessarily because they were bad ideas.”
Having a commercial entity to support Presto would likely be necessary for widespread adoption as well, and Facebook isn’t in that business at all. They built Presto for their own use, much like they did Cassandra, the database that Pfeil’s company now provides products and services around. Needless to say, he’s incredibly grateful for Facebook’s willingness to Open Source its projects.
Presto Inspires Twitterverse and Online Mags
The Twitterverse and online news magazines don’t seem to be as level-headed as Pfeil. They’re asking questions and making statements like, “Is Hive history?” “Impala injured?” “Apache Hive Is Toast” and even things as out of context as “Zuck on that, Hortonworks and Cloudera.”
Cloudera and Hortonworks Welcome Presto
It’s doubtful that anyone who works at Cloudera or Hortonworks is scanning the job boards or worrying that their enterprise customers will stop doing business with them anytime soon. Instead they’re championing open source.
Here’s what Charles Zedlewski, Vice President of Product at Cloudera told us:
Presto is a good example of the strength of open source: a diversity of ideas and approaches. Facebook is the inventor, and single largest corporate user, of Hive and they are the ones launching Presto. This speaks volumes about the long term technical potential of Hive and validates the technical direction Cloudera established more than a year ago with Impala.
As for direct support, we're keeping our R&D energies focused on Impala. It already has a track record in-production in the enterprise, and our customers like the performance, concurrency, security and functionality; we have a number of new exciting additions planned as well.
You'll note that Presto runs on CDH, so our users and customers are more than welcome to try it out just like they do with dozens of other open source projects that work with CDH, but which Cloudera does not support directly.”
Too Early for Enterprises to Think About Leaping
It often takes years before an open source project becomes enterprise-ready. At the Strata conference Jack Norris of MapR suggested that there are enterprises who still believe that Hadoop may not be at that point. (I disagree. More on this tomorrow.) So for any companies shopping for big data solutions, we’d caution you against making, or not making, a decision based on Presto. You’ll likely delay your growth if you do. Unless, of course, you employ hundreds of the world’s most brilliant big data engineers like Facebook, Instagram, Twitter, Yahoo and Amazon does. In that case jump right in and show us what Presto can do for you.
For engineers who don’t work at such data intense businesses but who are interested in Presto, there’s great news. It’s an open source project, so you can jump right in and write code alongside some of the world’s most experienced big data engineers.