Skylight Cave, Oregon

Shine a Light on Your Dark Data

6 minute read
David Roe avatar

For those worrying about the data security issues caused by enterprise file sharing or poorly constructed information management strategies, add a new item to your 'things to worry about' list — dark data.

Never heard of dark data? Gartner coined the term to describe enterprise data that's fallen into disuse, due to a lack of ownership, poor visibility, accessibility, etc.

And the problem will only get worse as data gathering gets more efficient and companies continue to ignore content management.

Any unmanaged or unsupervised data poses a potential security risk. Any data not actively in use in the organization (or required for e-discovery purposes) is freeloading, and storage space costs too much for that.

The emergence over the last 12 months of new technologies — in content analytics, predictive analytics and process management — aim to help organizations bring their dark data to light.

But is dark data really all that dark? Or is it just another facet of poor data management?

Dark Data, By the Numbers

According to a recent AIIM report (registration required) by Doug Miles, in spite of content analytics' potential, 80 percent of those surveyed had yet to allocate a senior role to initiate and coordinate content analytics applications.

The lack of designated leadership and shortage in analytics skills is holding back the deployment of content analytics tools, according to almost two-thirds (63 percent) of respondents.

Dark data was named as a big business driver for deploying content analytics, with other drivers including process productivity improvements, additional business insight, and adding value to legacy content.

Seventy-three percent of respondents felt that enhancing the value of legacy content was better than wholesale deletion, while more than half (53 percent) said that auto-classification using content analytics was the only way to get content chaos under control.

Content Management And Metadata

Should organizations be concerned about dark data? According to Greg Milliken, vice president of marketing at M-Files Corporation, the answer is no. Milliken shared the M-Files take on dark data in an interview with CMSWire. Dark data's not sinister, just badly managed data.

“One of the whole premises of our architecture is that by classifying information by what it is — it’s a proposal, it’s an invoice, it’s a support ticket — and relating it to other key elements that themselves are often the fundamental drivers of the business, it allows this data to show up dynamically based on the context without the individual [searcher] necessarily being aware of it,” he said.

M-Files enterprise information management platform provides users with a metadata-driven system for organizing and managing data.

“What we believe is that what drives the discovery and utilizations of this data that could go dark are those relationships, those connections to the intelligent layer that we believe is metadata. So if I am searching for something relating to customers, if other assets have been tagged with this customer, or information, that now happens dynamically in the M-Files and it throws this up,” Milliken said.

Wasting Content

Another approach comes from predictive analytics vendor idio. Andrew Davies, CMO at idio, shared some numbers from SiriusDecisions research: between 60 to 70 percent of content produced by business-to-business companies goes unused. Corporate Visions data puts this figure as high as 90 percent. Whichever number you believe, we can agree that a lot of data goes unused and that can mean unrealized business potential.

Learning Opportunities

Like Milliken, Davies believes that technology can solve the problem:

“The key to solving this problem is technological. To make content ‘useful’ it has to be understood and served to those for whom it will be most relevant. This might work manually when you only have a few assets, but it doesn’t scale in any serious organization running simultaneous campaigns and marketing channels. It’s not possible for humans to both have a global understanding of every piece of content created within the enterprise and know what it is about, this has to be turned over to machine learning systems that can cope with large volumes of content and customer interactions at scale,” he said.

Big data technologies like idio's Content Intelligence offering are a response to the ‘dark data’ issue, Davies continued, stating it was designed to ensure that useful content, dark or otherwise, is associated with the right customers and prospects.

The Problem With Tagging

At the heart of the dark content problem lies tagging and meta-tagging. New research form Concept Searching showed that many SharePoint users struggle to manage their content because of poor tagging.

Carly Mulley, vice president of marketing at Concept Searching, pointed out that SharePoint doesn’t provide sufficiently robust tagging functionality for users to properly manage their content and as a result destines a great deal of content to the dark.

“It [SharePoint] hasn’t addressed the metadata issue. It provides term store — which is rudimentary at best — and it is still inefficient because it is manual and... dependent on human input. It introduces the problem of erroneous metadata” she said.

“Then there are the end users. They don’t care. Microsoft doesn’t address this issue. So even in Office Graph, its new search capabilities, it still uses end-user metadata. How can you efficiently do e-discovery and records management when you are essentially using erroneous data?"

Irene Tserkovny, President and Chief Operating Officer at document management vendor Docurated said its solution uses "relevance" to match dark data with marketing people.

“Docurated was built to address all the problems in the content supply chain. Docurated discovers both active data that enterprises store and use, such as Salesforce data around revenue and pipeline, and dark data such as how content is being used and not being used by sales and marketing throughout the sales cycle,” she said. "We marry content signals such as relevance, freshness and usage, to create a real view of what content and stories are relevant across the organization.”

Is there a single answer for dark data? Clearly it depends which vendor you ask. And maybe you can scratch dark data off of your things to worry about list. But businesses should be aware of the dark data hiding in their systems, and decide how it fits into their overall information management strategy.

Title imageCreative Commons Creative Commons Attribution 2.0 Generic License by  Thomas Shahan 3