Hoarders Anonymous for Unstructured Data

It’s called a “JUNK drawer” for a reason -- because it’s mostly junk, and most of it should be tossed in the trash. So why do we keep these items of questionable value or use?

Buried Treasure and Junk All Around

In the TV show, “Hoarding: Buried Alive,” you'll notice it’s hard to identify the items of value mixed in with the piles and piles of trash. It’s also difficult to identify who in the house owns a particular item so that you can ask them why they even have it, or get permission to remove it. They don’t use half the stuff, and they’ve forgotten they own it.

This is a lot like unstructured data -- tons and tons of PDFs, Word docs, spreadsheets, PowerPoint presentations, audio files, picture files and pretty much anything else that doesn’t live in a “structured” database -- and how people in most organizations just keep collecting it.

The IT team, tasked with getting their arms around it for compliance and governance purposes, has no idea who owns it, whether it’s important or anyone is even using it -- and is afraid to delete it for all those reasons. In the meantime, the line-of-business people continue to create and hoard more data -- year after year -- storing it in places that IT not only doesn’t control, but probably doesn’t even know about (SharePoint gone wild, DropBox, Box, etc.).

As with physical hoarding, where some of the junk found at the bottom of the pile isn’t just annoying, but is downright dangerous -- mold, broken items, unmentionable filth -- hoarded unstructured data can also fall into the “dangerous” category -- confidential IP, customer data, medical records, personal identifiable information, etc.

All organizations have unstructured data, but most are collecting and maintaining large volumes of orphaned data -- so called because it has no owner. Without an owner to determine its importance, lifespan or who should have access to it, we default to keeping this data of unknown value forever. I have seen organizations go through a consolidation or migration, or move data centers, and take all of that unassigned unstructured data with them. They just migrate the mess.

Control the Data Chaos

No one wants to be a hoarder, so how did it get so bad? Most organizations have a poor governance process for granting access to unstructured data. They are unable to determine who owns the data, so they continue to collect more orphaned files in repositories like file shares and group folders. Many companies will admit to having a problem managing unstructured data, but they seem to accept it as just the cost of doing business.

The current acceptance of this problem with unstructured data is the corporate version of household hoarding. IT doesn’t know what is out there, or whether it’s necessary or subject to compliance. The line-of-business folks are on the hook to prove they are compliant with regard to unstructured data, but they don’t know how to find it or how to report on in (they just trust that all is well). The problem is that compliance demands that risky information contained in some unstructured data be protected, and auditors are becoming savvy to the concept of unstructured data and the lack of accountability, or ownership of the information contained therein.

With household hoarding, a cleaning crew often is called in to help the hoarder because the government mandates a “clean up or face the consequences” stance. Shortly after the cleanup crew leaves, however, the stuff starts to collect again because the person falls back into old habits, with no control over what they collect and how to keep things clean and organized.

It's not good enough to just clean up the mess. You need a governance process to keep your unstructured data clean and sorted. Don’t wait for the authorities to tell you to put a plan in place to govern access to unstructured data. Start the clean-up process now, and stop being an unstructured data hoarder.

The good news is that you don’t need a 12-step program to conquer vast collections of unstructured data. You can overcome the problem and get control of the chaos with the following six steps:

Step 1: Discover users and resources -- Determine what is important by rolling up your sleeves and digging through the piles of data
Step 2: Classify data and access rights -- Prioritize by running the data through a risk engine; focus first on those areas of greatest risk
Step 3: Audit and report on usage -- Track the actual usage of the data to help identify ownership and sort out stale data
Step 4: Assign ownership and approvers -- Obtain validation from the business owners to determine what is important and facilitate acceptance of data stewardship
Step 5: Remediate -- Clean up the mess with access certifications
Step 6: Automate Control -- Keep it clean with access requests and approvals

Don’t just wait for the auditor to show up and tell you what you have to do. Start cleaning up the unstructured data mess now, before it gets worse and you end up with a massive data breach that spins out of control.

Learning Opportunities