The promise of big data is that it contains big information, big insights. But data and information are not the same. Data is only as valuable as the information and insight we can extract from it, because it’s the information and insight that help us make better decisions.
What is the Big Data Fallacy?
Data does provide information, and more data generally gives more information. However, the fallacy of big data is that more data doesn’t lead to proportionately more information. In fact, the more data you have, the less information you gain as a proportion of the data.
The return on extractable information from any amount of big data asymptotically diminishes as your data volume increases. In nearly all realistic data sets (especially big data), the amount of information one can extract will be a very tiny fraction of the data volume: information << data.
What about insights? How does that relate to information and data? All insights are information, but not all information provides insight. In order to provide insight, information must be:
If the information fails in any one of these criteria, it wouldn’t be a valuable insight. These 3 criteria will successively restrict extractable insights to tinier and tinier subsets of the extracted information. The second fallacy of big data is: insight << information.
Both big data fallacies are summarized in a single inequality relationship: insight << information << data (see figure).
The value of big data is hugely exaggerated because insight (the most valuable aspect of big data) is typically a few orders of magnitude less than the extractable information -- which is again several orders of magnitude smaller than the sheer volume of your big data. It’s not that big data has no value, it’s just overrated. Even when your data is very, very big, the probability of finding valuable insights from it may still be abysmally small.
The fallacy of big data may sound disappointing, but it’s actually a strong argument for why we need even bigger data. Because the insight we can derive is such a tiny fraction of data, we need to collect even more data and use more powerful analytics to increase the likelihood of finding it. Although big data doesn’t guarantee many insights, increasing the volume of data does increase the odds of finding it.
What is Smart Data?
Big data provides the infrastructure for economically storing and processing unprecedented amount of data. But undigested big data (e.g. terabytes of raw logs) and the technology required for it (e.g. Hadoop, Cassandra, etc.) is pretty much inaccessible to the average business person. There is a huge disconnect between what big data provides and what businesses need. Smart data is how you can fill the gap (see figure above).
Smart data is the analytics we use to extract relevant information and insight from big data and the visualization we use to present the result. Smart data technology must be designed such that we make our data:
1. Useful: relevant + actionable
Because big data technology is so scalable, businesses can easily capture data first and ask questions later. This means big data is often captured without a specific purpose in mind, so its signal to noise ratio is typically very low -- most of it will be irrelevant to the problem you are solving.
Efficient search and filtering technology is necessary in smart data to make identifying the relevant data easy. Data that are not relevant can’t possibly be useful. More importantly, the analytics we use must find insight that is actionable. Information that is not actionable is like saying the world will end in 5 minutes, and there’s nothing you can do about it. This is certainly informative, but if you can’t take action against it, it’s not useful.
2. Digestible: intuitive + interactive
Big data is not only big in volume; it’s also very diverse and complex. General spreadsheet charting tools (e.g. lines, bars, pies, etc.) are no longer sufficient to make complex data digestible to business decision makers. Advanced visualization designed specifically for particular data structures is necessary to make big data intuitive to non-analysts.
To facilitate insight discovery, we must empower the data consumer to explore the data beyond what is presented. Interactive tools for data exploration are very important in smart data because they will help more people understand the data better.
The value of big data is overrated, but it’s an important enabler and it provides the foundation for scalable data storage and processing. However, what businesses really need is insight that helps them make better decisions. Capturing big data only gets us part way there. Smart data bridges the gap by facilitating information extraction and insight discovery. Although big data technology won’t help you make bigger decisions, smart data can certainly help you make smarter decisions.
Editor's Note: Interested in reading another take on Big Data? Try Concentrate on Smart Analysis, Not Big Data by @bucholtz