Everyone is talking about big data. And, if you have ever watched Discovery Channel you probably know a little about dark matter and dark energy. But dark data? Is this just some new marketing buzzword, or is it a real problem?
In fact, dark data is very real, and can be exceptionally problematic. Put simply, it's a term used to define all those bits and pieces of data floating around in your environment that aren’t fully accounted for. Some of the most pertinent examples are ZIP files used to transport large documents or groups of documents and PSTs or personal folder files used by Microsoft Outlook to hold emails, contacts, notes and calendar items on a local desktop or notebook. These locally stored container files may hold vital, even risky, corporate information and are often not embraced in typical corporate retention or archiving processes.
According to IDC, PST and ZIP files account for nearly 90 percent of dark data. And with email growth widely pegged at 40 percent per year, the risk isn’t going to get smaller any time soon. ZIP files are not that difficult to open and view, though to the simple scanning process this can be a challenge, especially if they are password protected. PST files pose a greater challenge. They may contain thousands, even tens of thousands, of individual items all hidden from the typical system scan.
So, why are these files “dark"? Consider the PST file and its potential contents for a moment. PSTs are a collection of email data, the contents of which aren’t available to anyone aside from the file owner. Some, like older Microsoft Outlook auto-archive files, were automatically created, so even the user may not know why they are there or what’s in them.
Compounding the potential “invisible” nature of the PST file, they also build up rapidly — on corporate drives as well as desktops. Corporate storage can even be dominated by PSTs as they build up over time, are backed up again and again and are a typical component of images of former employees’ residual data. As a result, the darkness of the data, just gets darker — because IT staff simply doesn't know what’s in these files and many even end up “orphaned” from their original owners. As a result, it’s very likely that your company has volumes of data that you may or may not know about or know what they contain. Worse, these files are consuming valuable space, costing money to store and manage, all while putting your organization at great risk.
The Hidden Risks in Dark Data
It can be easy to simply ignore dark data because the act of physically keeping it doesn't seem that expensive considering today’s low cost of storage. But if you’re like most companies, you likely have many terabytes of “dark” PST files consuming your storage resources.
The more critical risk is that if this data is saved, it is discoverable. In the event of a legal request for data, all “relevant” emails must be produced, and without knowledge of a PST’s contents this recovery request could produce hundreds of thousands, even millions, of emails that need to be forensically discovered. With typical forensic searches costing $5 per email, this could result in a very costly endeavor.
Perhaps even more risky than the cost of discovery is the risk that theses emails may be overlooked during the implementation of vital compliance requirements and retention policies. Because PST folders are created and controlled by the end user, they often fall outside corporate compliance policies for email retention. It is common that a PST file contains many emails that are expired (ready for deletion), yet since the PST file looks “current” it is overlooked by standard retention and disposition policies. As a result, these files can put companies at grave risk for sanctions, fines and adverse legal outcomes.
Companies that have migrated email to hosted solutions, typically cloud-based, don’t escape this issue either. If those companies didn’t mount an aggressive PST migration program prior to moving to the cloud, it’s more likely that they still have dark data from former employees floating around on corporate servers without any management. In the case of litigation, it is often the email data from former employees that’s being requested, so companies facing e-discovery requests in this situation can find the challenge even more difficult and fraught with even greater spoliation risks.
Solving the Challenge of Dark Data
What’s the answer? Simply put, eliminate this one aspect of dark data by doing away with personal archives or PST files.
- Box Cops to Bad IPO Timing, It's Time to Unbox
- Extracting Insight from Unstructured Data
- Trends in Web Content Management From #jboye14
- Are You Too Old to Work in Tech? IT's Midlife Crisis
- Who Are the 100 Fastest Growing Software Companies?
- Outage Outrage As Microsoft's Azure Stumbles
- Big Data is Getting Smaller and Smarter