There’s a difference between a big pile of data and Big Data.
It’s hard to believe that this needs to be said. But try this. Ask someone what Big Data is and see if you don’t hear words like petabyte, exabyte, zettabyte, yottabyte and the like dominate their answers. Most common definitions of Big Data revolve around quantity and data/information storage.
Now if the individual you’re talking to is someone who has done some reading, attended an O’Reilly Strata or Giga Om Structure Conference ( or their online equivalents), or read my articles on CMSWire or on the Big Data Geeks blog, then they’ll probably add terms like Volume, Velocity and Variability to their definitions.
Big Data Drives Action
Though that’s getting closer, it’s not likely to impress Facebook VP of Engineering Jay Parikh. Addressing a group of reporters touring the company’s new data center late last month, he explained, “Big data really is about having insights and making an impact on your business. If you aren’t taking advantage of the data you’re collecting, then you just have a pile of data, you don’t have big data.”
And drawing insights from the 2.5 billion pieces of content, 2.7 billion “likes,” 300 million photos, 70,000 queries and other information that the company processes each day is no small feat. Especially because Facebook strives to create new products, to deliver highly personalized user experiences and to target ads toward its members in (near) real time.
For Facebook’s engineers this, no doubt, presents a challenge because it’s not only historical data that their data scientists need to look at, but also a non-stop onslaught of continuous data streams and processes. In order for Facebook to be sticky and to win ad clicks, being current is what it may be all about.
The Need for Real-Time Response
Take, for example, you on the site -- if you’re liking a friend’s new leather boots right now, for example, then serving up an ad for those boots in the next 60 seconds might be more beneficial for an advertiser (and for Facebook) than doing so the next time you sign-on. After all, by then you might be planning a trip to the Bahamas or joining PETA and therefore totally disinterested in the ad. Many of our decisions are made in short order. Striking while the iron is hot is key to business results.
And that’s why understanding users in real-time is so important, and such a great opportunity, for Facebook. After all, be it good or bad, users spend a lot of time on the site (Americans average just under eight hours per week) and the company not only tracks their every move but it also processes every piece of data it gets. This wouldn’t have been possible or financially feasible a decade ago, before servers became commoditized (and therefore less expensive), and before Hadoop’s open source framework for data intensive applications (or something like it) was created.
But having the technology to work with a large pile of data isn’t enough. According to Parikh’s definition of Big Data, data doesn’t become Big Data until you can draw insights from it that will impact business results. And I’d add to that, impact them in a big way.
And, with the exception of Google, Amazon and a few others, we may not be there just yet. We have the data, we have the technology, we glean insights, but are our insights producing big business results?
That’s the real Big Data challenge.
Editor's Note: This isn't the first time that Virginia has written about Big Data: