Information Management, Big Data Bits Featuring Recommind, Cloudera, Apache Storm, Apache Samza + An Interesting Tidbit About SumAll
There’s never a dull moment in the world of Big Data and last week was no different. We saw prizes, surprises, funding announcements, donations and an unusual type of disclosure by a Big Data company. 

Recommind Rakes-In 15 Big Ones

Ever hear ofRecommind? If not, open your eyes and unplug your ears.

The Big Data e-Discovery provider whose software, up until now, has been used primarily by the legal industry just raised US$ 15 million from SAP Ventures.

The company’s proprietary Predictive Coding technology learns from the decisions that users make as they search through voluminous documents. In time the software understands the meaning behind what users need, automatically and intelligently predicts the best information available and instantly provides the most relevant results.

It seems like a God-send in a world of Big Documents, Big Content and Big Data. In fact it might even be the “secret weapon” that either the U.S. House of Representatives or the Senate could use as they try to overwhelm each other with documents during the current budget battle.

But that has little to do with why SAP Ventures is making the investment. They believe that there’s a large Enterprise market that Recommind can win over. We think so too.

Cloudera Impala Wins A Bossie Award

OK, so Cloudera Impala isn’t “Breaking Bad” and a Bossie isn’t an Emmy. But in their own ecosystems they are equivalents; so hopefully someone sent Impala architects Marcel Kornacker and Justin Erickson tee-shirts or hoodies bearing the OSI (Open Source Initiative) or Apache logos.

What’s special about Impala is that it turns Hadoop into an engine for exploring data interactively using standard SQL. This provides companies with new opportunities that they couldn’t have as easily realized in the past.

Take Expedia, for example. The company must have a fine-tuned website that understands what visitors want and that can deliver results to partner hotels, airlines and other travel vendors in short order.
Using a traditional data warehouse to capture and analyze the clickstream data generated to, from and within its website didn’t provide Expedia with as much insight as additionally analyzing larger volumes of historical and detailed data stored in Hadoop.

So the company added Impala to its toolbox.

Today, Expedia uses Hadoop to empower what they call their “full data lifecycle.” Data is collected from online activity, loaded into Hadoop, scored and analyzed, and that data generates scoring engines which impact the recommendations, search results and sort orders on Their Cloudera platform is integrated with the incumbent data warehouse and generates more business wins for Expedia.

Learning Opportunities

LinkedIn Open Sources Samza

Big Data geeks love to share, so there must have been quite a celebration among LinkedIn’s engineers when they open sourced Samza , the company’s stream processing framework. It helps engineers build applications that process feeds of messages -- update databases, compute counts and other aggregations, transform message and a lot more.

It was accepted as an incubator project with the Apache Software Foundation. The details on Samza are a bit geeky, so if you want to know more, go here.

Apache Incubates Storm

Remember Storm? It fills the gap between real time processing and batch oriented Hadoop. Like Samza, it was accepted into incubation by the Apache Software Foundation.

Twitter owned Storm up until recently when it decided to share its web analytics framework that tracks clicks and fuels the engine that answers the question “What’s trending on Twitter?"

Look for a group of geeks to build out the product and make it Enterprise-grade or for a sponsor to adopt it.

SumAll Gets Transparent With Its Salaries

SumAll, a data connection service that helps companies turn data into dollars, is serious about full disclosure and radical transparency. So serious, in fact, that Dane Atkinson, the company’s CEO, has made employee compensation information visible to all of his workers.

He thinks that this will let them know where they stand (or sit) within the organization, and, if they’re not happy, inspire them to do what it takes to financially measure-up.

Title image courtesy of Chones (Shutterstock)