library in Portugal
PHOTO: Ivo Rainha

If we printed out one zettabyte of data as books, we could give every one of the 7.7 billion people on this planet 129,870 of these books. They’d have almost 13 billion words to read. An average reader can read 1,000 words in about five minutes. It would therefore take 752 years of nonstop, no-sleep reading for every man, woman and child on the planet to read a zettabyte. One zettabyte. There were 33 zettabytes of data created in 2018 alone, and by 2035 it is estimated that there will be over 2,000 zettabytes.

In 2010, Google Books estimated that there were about 130 million unique book titles published. Mental Floss estimated that there are roughly 800,000 books published every year globally. My own rough calculations from Wikipedia data came to about 900,000 per year. Let’s be generous and say that there are about one million unique titles published each year. That would mean that by 2020 there were about 140 million unique titles published.

“The average book in America sells about 500 copies,” Chris Anderson wrote for Publisher’s Weekly in 2006. Steven Piersanti, president of Berrett-Koehler Publishers, wrote in 2016 that total sales for a typical non-fiction book are no more than 2,000 copies. An average of the two figures gives us 1,250 copies. If we multiply that by 140 million we get 175 billion copies of books published since publishing began.

Summarizing all these crazy calculations we can say that one zettabyte — just one zettabyte — if printed out would create 6,000 times more print than all the books that have ever been printed. The sheer scale of the amount of digital data that is being created and stored is quite simply beyond imagination.

Organizations are collecting digital data simply because they can. Studies by IBM and others found that up to 90% of the digital data we produce and collect, we never use. Michael Kozlowski, writing for Good EReader in 2015, reported on studies of ebook reading that showed that 60% of ebooks bought are never opened and completion rates for those that are started can be as low as 20%. Interestingly, one study by ebook maker Kobo found that the more people paid for a book the more likely they were to read it.

The more data you collect, the more you strain your ability to organize and analyze it. We must control much better our data collection processes, because otherwise we will have all this data and we will end up less able to make timely, quality decisions.