2014-17-September-Dirty-Cat.jpg
Talk to one of those high priced data scientists about their data and you’ll see something quite astounding: they really, really care about it. It’s almost as if it was a garden they were tending or a scruffy, homeless kitten they’ve taken in to raise.

While the sentiment might initially sound ridiculous, it isn’t. Data fuels their work (and the insights that inform yours). And if you pair dirty or poorly labeled data with the best trained models and finest algorithms, it’s a waste of time and energy. The knowledge gleaned will be worthless, at best.

A Fresh Start

Consider a film studio that's concerned about why its movie wasn’t doing better at the box office. The sentiment seemed neutral, suggesting that at least some people really liked it. The thinking that followed was to find out what kind of audience the movie appealed to and market to them.

But that could have been a huge waste if the data wasn’t enriched from the start, said Lukas Biewald, founder and CEO of CrowdFlower, a people-powered data enrichment start-up. He told us about the real world example in which a closer look at the raw data revealed that all the positive sentiment about the movie came from before its release date (meaning that people Tweeted and/or shared on Facebook about being excited to see it), and all of the negative sentiment came after people had actually seen it.

Needless to say working with good, clean, enriched data matters.

It’s why data scientists spend up to 80 percent of their time caring for and curating their data, as opposed to doing actual science.

Crowdsourced Data 

Almost everyone agrees that it’s not the best use of their time, but automated tools don’t live up to their expectations (machines don’t know that “that rocks” is a positive sentiment), so they don’t have a better answer.

Unless they know about CrowdFlower, that is.

It’s a Silicon Valley startup that’s growing fast because it hooks-up any one of 5 million data professionals who want to work from home or make extra bucks with data scientists and/or their employers. All data scientists have to do to use it is to enter a self-service portal on the site, point to a web interface or API, indicate what datasets they needs enriched and by when and CrowdFlower’s algorithms take care of finding the right workers and managing the process.

The company already has some hot clients like eBay, Edelman, EventBrite, The Home Depot, Unilever, and VMWare.

And the VC market is impressed as well. Today the company announced that it has raised $12.5 million in Series C financing led by Canvas Venture Fund with participation from existing investors Bessemer Venture Partners and Trinity Ventures. The investment, which brings the total amount raised by the company to $28 million, will help the company support its rapid growth.

We asked CrowdFlower to give us an example of what it can accomplish in short order. Twenty-four hours later it sent us an infographic built from the results of analyzing 20,000 tweets illustrating why Apple might market its new watch to men and women differently.

2014-17-September-CrowdFlower-Applewatch.jpg

We were impressed. Are you?

Title image by Simona (Flickr) via a CC BY-NC-SA 2.0 license