If I had a dollar for every time I’ve written about a company that promises to deliver “big data for all” or “big data no data scientist required” or some variation thereof, I’d be rich.
OK, maybe not rich, but I could foot the bill for a pretty nice dinner.
Big promises and big ambitions aren’t a bad thing. After all, if technology vendors are hard at work trying to make data driven insights accessible to more people, then maybe everyone from medical researchers to retailers to school teachers will be able to leverage big data to make the world a better place, right?
That being said, here’s a question: Who is the “all” that is supposed to be able to glean data intelligence without needing a data scientist? The boutique owner who wants to know how many midi skirts, and in which colors, she needs to order for the fall? The practicing physician who would like to cut the time it usually takes to diagnose rheumatoid arthritis from nine months to a few weeks? The guidance counselor who lies awake at night wondering how he might discern between patterns of behavior that suggest ADD versus something a kid can “grow out of”?
Adatao = Data from Alpha to Omega
“That’s what we’re aiming for,” says Chris Nguyen, co-founder and CEO of Adatao, a much heralded startup that just received $13 million in new funding from Andreessen Horowitz, Lightspeed Ventures and Bloomberg beta. These venture capitalists are betting big that Adatao can deliver.
Marc Andreessen, one of only six thought leaders chosen for the Worldwide Web Hall of Fame, says that he was “blown away” when he saw what Nguyen and his team had built, adding that Adatao was designing the future of big data.
Needless to say, after hearing this, we were compelled to find out more about “Big Data 2.0” as Nguyen refers to Adatao in a blogpost.
Who Is the “All” in Big Data for All?
“Forgive us for being skeptical,” we apologized before we asked Nguyen this question, “Who is the 'all' that you’re referring to when you say 'big data insights for all'?"
Nguyen confirmed our doubts, no apologies made or required. “That’s the vision,” he explained. But for now “all” refers to data scientists, data engineers and business analysts. Up until now, they haven’t had the tools they need to work together.
The "all" in "big data for all,” that non-data workers envision, is “big data for normal people.” Though we’ve seen it written that Adatao provides that, the truth is that it’s more than a decade away, according to Nguyen.
What Is “Big Data 2.0”?
“It’s big data made useful,” he says. “Up until now 'big data' has meant storage, dumping a bunch of data into HDFS.”
CEOs are more than a little a bit unhappy with that, says Nguyen. CIOs have spent millions without having much to show for it.
And the reason for this is pretty simple, it seems. Hadoop has relied on MapReduce which emphasizes “Reliability, reliability and scalability,” according to Nguyen. (The repeat of “reliability” is intentional.) It was created by Google to be reliable as it indexed the web (because if it failed, the process would have to begin again) and speed suffered by design, which compromises it as a tool for making business decisions in real time.
And while engineers like Nguyen, who has worked both with quantitative statistical arbitrage trading systems and at Google, has had access to game-changing technologies like Big Compute in the past, it was too expensive for most companies to use.
With the advent of Apache Spark, which handles lightning fast cluster computing and deep learning at high speed, it becomes affordable.
“Spark changed the game,” says Nguyen.
And Adatao is “the missing puzzle piece that bridges the gap between Big Data 1.0 of the past five years, and Big Data 2.0 going forward,” he explains.
Adatao is made up of two components:
Pinsights, the “beauty layer” that enables business analysts and data scientists to easily and fluidly interact with Big Data in an easy to consume, interactive format. Similar to a Facebook or Google Search engine, predictive SmartQuery was built into a Google Doc type document that allows users to instantly and collaboratively produce embedded analytics within seconds to assist with decision making.
And pAnalytics, the “power layer” that enables data scientists and data engineers to analyze massive amounts of data in seconds. pAnalytics sifts through the data by representing it as one large, simple table, hiding all the data complexities, enabling data scientists and engineers to work with Big Data analytics in a very simple, powerful way. Data can be pulled in from Cassandra, analyzed in Spark, and the results saved back to S3 -- all using one familiar API. This allows data scientists and engineers to focus on data analysis, and multiply their productivity by 10.
What Adatao does, at the end of the day, is provide a way for data scientists, data engineers and business analysts to work together.
For now, “normal people” will have to rely on those folks to provide “actionable insights” from big data.
But that’s temporary. Nguyen intends to deliver big data to for the rest of us too.
But it will take a while, he says.
Until then, let’s let data scientists, data engineers and business analysts make their way down that road and pave the way for us “normal people.”