The promise of big data is that it contains big information, big insights. But data and information are not the same. Data is only as valuable as the information and insight we can extract from it, because it’s the information and insight that help us make better decisions.
What is the Big Data Fallacy?
Data does provide information, and more data generally gives more information. However, the fallacy of big data is that more data doesn’t lead to proportionately more information. In fact, the more data you have, the less information you gain as a proportion of the data.
The return on extractable information from any amount of big data asymptotically diminishes as your data volume increases. In nearly all realistic data sets (especially big data), the amount of information one can extract will be a very tiny fraction of the data volume: information << data.
What about insights? How does that relate to information and data? All insights are information, but not all information provides insight. In order to provide insight, information must be:
If the information fails in any one of these criteria, it wouldn’t be a valuable insight. These 3 criteria will successively restrict extractable insights to tinier and tinier subsets of the extracted information. The second fallacy of big data is: insight << information.
Both big data fallacies are summarized in a single inequality relationship: insight << information << data (see figure).
The value of big data is hugely exaggerated because insight (the most valuable aspect of big data) is typically a few orders of magnitude less than the extractable information -- which is again several orders of magnitude smaller than the sheer volume of your big data. It’s not that big data has no value, it’s just overrated. Even when your data is very, very big, the probability of finding valuable insights from it may still be abysmally small.
The fallacy of big data may sound disappointing, but it’s actually a strong argument for why we need even bigger data. Because the insight we can derive is such a tiny fraction of data, we need to collect even more data and use more powerful analytics to increase the likelihood of finding it. Although big data doesn’t guarantee many insights, increasing the volume of data does increase the odds of finding it.
What is Smart Data?
Big data provides the infrastructure for economically storing and processing unprecedented amount of data. But undigested big data (e.g. terabytes of raw logs) and the technology required for it (e.g. Hadoop, Cassandra, etc.) is pretty much inaccessible to the average business person. There is a huge disconnect between what big data provides and what businesses need. Smart data is how you can fill the gap (see figure above).