There’s always plenty of news in the Big Data world and, try as we might, we can’t cover it all; so we’re highlighting a few of the items that made us take notice and offering a wee-bit of context around it.
Outdoing Hadoop: Facebook Scales Apache Giraph to a Trillion Edges
Raise your hand if you’ve heard of Apache Giraph. OK, we admit it, up until earlier this week we hadn’t either. But we bet Big Data wranglers worldwide are taking a good look at it today. Why? Because Facebook chose it over Apache Hive (a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis) and Apache GraphLab (a graph-based, high performance, distributed computation framework that was first developed for machine learning tasks and is now used for a variety of other data-mining tasks) to bring a new style of search to its 1.15 billion users.
In order to do this, they had to make a good number of improvements to Apache Giraf. The result? It can now scale to a trillion edges which, according to Facebook engineer Avery Ching (see his blog post ), was impossible last year.
What does this big advancement mean for Apache Hadoop? We’ve seen a few headlines that say things like “Move over Hadoop,” but we don’t think Hadoop’s going to become second fiddle anytime soon.
Why? First because not many companies want to, or need to, wrangle as much data as Facebook. Second because not many companies (in fact maybe no company other than Google) employ as many gifted engineers (I count at least 150) as Facebook and, at least for now, it's going to take a lot of brain-power to handle Giraph. And third, Giraph is nowhere near Enterprise-ready, and Hadoop is. Finally, while Enterprises may have problems working with Hadoop, scaling isn’t one of them.
What’s So Special About Lucene/Solr's Newest Commiter?
Talk about leaning in, Apache Lucene/Solr’s newest core-commiter is female! Out of the 43 individuals in this prestigious group, Cassandra Targett is the only woman.
From what we can tell, she didn’t have to break through a glass ceiling to claim her place, instead she used a great deal of her talent and time to create the Solr reference guide and to update it after each release.
While Targett did much of this work as an employee of LucidWorks, now that the company has donated the Lucene/Solr documentation to the Community, she’s “volunteering” there. She not only helped get the documentation hosted by Apache but she also pitches in to create documentation when the developers don’t have time.
And, it’s worth noting that unlike most Apache committers, Targett doesn’t code.
So, maybe the news around Targett is not just that she’s a woman, but that the open source Lucene Solr community has awarded a prestige status to someone whose role is documentation. Imagine that!
And, in case there’s anyone who isn’t familiar with Apache Lucene/Solr, it’s (and these are Targett’s words),
Lucene is a full-text search library written in Java. It's super-fast and easily scalable, making it perfect for high-performance indexing needs. Solr is built on Lucene, but has additional features that allow it to be more of an out-of-the-box search server. It inherits Lucene's power and extends it to be incredibly flexible for the needs of nearly any search application.”
Why is Lucene/Solr significant to us? Because the proprietary vendors who provide similar technologies would add to the price tag of using something like it (Autonomy, Verity, FAST) with Documentum; Apache Lucene/Solr, in and of itself, would not, because it is Open Source. Second, proprietary vendors can do with their technologies what they want and that’s not good for you when you depend on them. Consider that EMC Documentum used to use FAST for search until Microsoft acquired it, then it could not. That’s not going to happen with an open source technology.
Continuuity Helps Partner Reap Real-Time Results
Remember Continuuity, the start-up that promises to make Hadoop accessible to the rest of us (i.e. software engineers who don’t work at Facebook, Netflix, Eventbrite, eBay …)?
This week they announced that they will integrate Continuuity Reactor (their PaaS offering for Big Data) with Crowd Control®, Lotame’s data management platform (DMP). For Lotame this provides two big benefits: first, they will be able to provide their customers with purpose-built applications to drive real-time insights into their data so that they can make more informed decisions and realize better ROI on their data assets; and second, because Continuuity’s platform, Continuuity Reactor, takes care of all of the loathsome work that has to be done before a Hadoop application can even be written, their developers will be able to spend their time on work that has an impact.
- The Future of Digital Marketing: 8 Trends
- 2014 Predictions: What Side of the Future Are You On?
- Oracle WebCenter Sites Review: Strengths, Weaknesses
- How Is Hadoop Like Teenage Sex? [Infographic]
- 7 Things Stew Leonard's Can Teach You About Your Customers
- Why Apple Needs Topsy in a $200 Million Way
- 3 Practical Ways to Boost Your Google+ Profile