triple Yeti sighting
PHOTO: hillary h

Which of these four urban legends is (mostly) true:

1. “Paul McCartney died in 1966 and was secretly replaced in the Beatles by a Paul look-alike.”
2. “LSD tabs are being secretly distributed as lick-and-stick Blue Star temporary tattoos to children.”
3. “A hotel guest awakened in a bathtub full of ice to find one of his kidneys has been removed for sale on the black market.”
4. “Eighty percent of the information in an organization is unstructured — information that doesn’t fit neatly into the rows and columns of a database.”

Anyone who's spent any length of time in the content management industry has likely cited the above 80/20 rule at some point with great conviction. 

About 10 years ago, I got curious about the source of the 80/20 data point. The closest I could find to a source was an attribution to a 1998 Coopers & Lybrand report. No copy of this legendary report existed in the AIIM archives. I even offered a “bounty” of a free AIIM membership in my blog (yes, high stakes) to anyone who could produce the report.

No takers.


Related Article: Hoarders Anonymous for Unstructured Data

Putting an Urban Legend to the Test

At some point during the past decade, the 80/20 citations fell from the spotlight, only to reappear with a vengeance this year in the context of the challenge of ingesting unstructured information and content into RPA and machine learning engines.

I decided it might be time to at least test out the urban legend, so last month, I asked 500 senior executives this question: “Think about ALL of the information in your organization. This information can broadly be described as structured DATA (fits neatly into the rows and columns of a database) or unstructured INFORMATION (documents, jpegs, conversations, images, forms, text messages, application files, etc.). What would be your best guess for the percentage of the total that is unstructured INFORMATION?”

Here are the results:

unstructured information graph

It turns out the average is 63 percent. Admittedly, the “63/37 rule” doesn’t exactly roll off the tongue with the same elegance as the “80/20 rule,” but: a. it’s pretty close to the urban legend; and b. it’s actually attributable to a source.

As interesting as that is (at least I hope it is), why is it important?

Related Article: How Machine Learning Will Tame the Explosion of Unstructured Data

Unstructured Data Remains a Challenge, Whatever Percentage You Believe

Everyone in the content management space has literally spent decades telling anyone who would listen that unstructured information is the Wild West of information management, and how important it is to get it under some sort of control. And to be honest, this has sometimes seemed a challenging assignment, one with a lot more exhausting “push” than “pull.”

But the era of RPA and AI and machine learning changes everything. If 63 percent of a geometrically-increasing volume of information is unstructured, how will organizations turn this unstructured information into the kind of structured data machines can ingest and analyze? I believe the result will be a huge potential market “pull” from the enterprise on the strategic importance of getting all that unstructured information under control. Finally.