Everyone is talking about big data. And, if you have ever watched Discovery Channel you probably know a little about dark matter and dark energy. But dark data? Is this just some new marketing buzzword, or is it a real problem?
In fact, dark data is very real, and can be exceptionally problematic. Put simply, it's a term used to define all those bits and pieces of data floating around in your environment that aren’t fully accounted for. Some of the most pertinent examples are ZIP files used to transport large documents or groups of documents and PSTs or personal folder files used by Microsoft Outlook to hold emails, contacts, notes and calendar items on a local desktop or notebook. These locally stored container files may hold vital, even risky, corporate information and are often not embraced in typical corporate retention or archiving processes.
According to IDC, PST and ZIP files account for nearly 90 percent of dark data. And with email growth widely pegged at 40 percent per year, the risk isn’t going to get smaller any time soon. ZIP files are not that difficult to open and view, though to the simple scanning process this can be a challenge, especially if they are password protected. PST files pose a greater challenge. They may contain thousands, even tens of thousands, of individual items all hidden from the typical system scan.
So, why are these files “dark"? Consider the PST file and its potential contents for a moment. PSTs are a collection of email data, the contents of which aren’t available to anyone aside from the file owner. Some, like older Microsoft Outlook auto-archive files, were automatically created, so even the user may not know why they are there or what’s in them.
Compounding the potential “invisible” nature of the PST file, they also build up rapidly -- on corporate drives as well as desktops. Corporate storage can even be dominated by PSTs as they build up over time, are backed up again and again and are a typical component of images of former employees’ residual data. As a result, the darkness of the data, just gets darker -- because IT staff simply doesn't know what’s in these files and many even end up “orphaned” from their original owners. As a result, it’s very likely that your company has volumes of data that you may or may not know about or know what they contain. Worse, these files are consuming valuable space, costing money to store and manage, all while putting your organization at great risk.
The Hidden Risks in Dark Data
It can be easy to simply ignore dark data because the act of physically keeping it doesn't seem that expensive considering today’s low cost of storage. But if you’re like most companies, you likely have many terabytes of “dark” PST files consuming your storage resources.
The more critical risk is that if this data is saved, it is discoverable. In the event of a legal request for data, all “relevant” emails must be produced, and without knowledge of a PST’s contents this recovery request could produce hundreds of thousands, even millions, of emails that need to be forensically discovered. With typical forensic searches costing $5 per email, this could result in a very costly endeavor.