Everyone is talking about big data. And, if you have ever watched Discovery Channel you probably know a little about dark matter and dark energy. But dark data? Is this just some new marketing buzzword, or is it a real problem?

In fact, dark data is very real, and can be exceptionally problematic. Put simply, it's a term used to define all those bits and pieces of data floating around in your environment that aren’t fully accounted for. Some of the most pertinent examples are ZIP files used to transport large documents or groups of documents and PSTs or personal folder files used by Microsoft Outlook to hold emails, contacts, notes and calendar items on a local desktop or notebook. These locally stored container files may hold vital, even risky, corporate information and are often not embraced in typical corporate retention or archiving processes.

According to IDC, PST and ZIP files account for nearly 90 percent of dark data. And with email growth widely pegged at 40 percent per year, the risk isn’t going to get smaller any time soon. ZIP files are not that difficult to open and view, though to the simple scanning process this can be a challenge, especially if they are password protected. PST files pose a greater challenge. They may contain thousands, even tens of thousands, of individual items all hidden from the typical system scan.

So, why are these files “dark"? Consider the PST file and its potential contents for a moment. PSTs are a collection of email data, the contents of which aren’t available to anyone aside from the file owner. Some, like older Microsoft Outlook auto-archive files, were automatically created, so even the user may not know why they are there or what’s in them.

Compounding the potential “invisible” nature of the PST file, they also build up rapidly -- on corporate drives as well as desktops. Corporate storage can even be dominated by PSTs as they build up over time, are backed up again and again and are a typical component of images of former employees’ residual data. As a result, the darkness of the data, just gets darker -- because IT staff simply doesn't know what’s in these files and many even end up “orphaned” from their original owners. As a result, it’s very likely that your company has volumes of data that you may or may not know about or know what they contain. Worse, these files are consuming valuable space, costing money to store and manage, all while putting your organization at great risk.

The Hidden Risks in Dark Data

It can be easy to simply ignore dark data because the act of physically keeping it doesn't seem that expensive considering today’s low cost of storage. But if you’re like most companies, you likely have many terabytes of “dark” PST files consuming your storage resources.

The more critical risk is that if this data is saved, it is discoverable. In the event of a legal request for data, all “relevant” emails must be produced, and without knowledge of a PST’s contents this recovery request could produce hundreds of thousands, even millions, of emails that need to be forensically discovered. With typical forensic searches costing $5 per email, this could result in a very costly endeavor.

Perhaps even more risky than the cost of discovery is the risk that theses emails may be overlooked during the implementation of vital compliance requirements and retention policies. Because PST folders are created and controlled by the end user, they often fall outside corporate compliance policies for email retention. It is common that a PST file contains many emails that are expired (ready for deletion), yet since the PST file looks “current” it is overlooked by standard retention and disposition policies. As a result, these files can put companies at grave risk for sanctions, fines and adverse legal outcomes.

Companies that have migrated email to hosted solutions, typically cloud-based, don’t escape this issue either. If those companies didn’t mount an aggressive PST migration program prior to moving to the cloud, it’s more likely that they still have dark data from former employees floating around on corporate servers without any management. In the case of litigation, it is often the email data from former employees that’s being requested, so companies facing e-discovery requests in this situation can find the challenge even more difficult and fraught with even greater spoliation risks.

Solving the Challenge of Dark Data

What’s the answer? Simply put, eliminate this one aspect of dark data by doing away with personal archives or PST files.

There are a variety of ways you can do this. There are free tools which will migrate PSTs back into mailboxes, including one from Microsoft. While migrating PSTs back into mailboxes was a serious concern years ago, newer versions of Exchange eliminate previous fixed mailbox quotas.

One thing to keep in mind is that some users have large quantities of PSTs and this can bloat your storage requirements; if you are moving to an unlimited hosted email solution, space won’t be an issue but the bandwidth to move large files will be. It is a balancing act. In fact, Office 365 does away with users’ ability to access existing PSTs, so migration is a requirement.

A more efficient way to eliminate PSTs is by deploying specific PST migration software which include automation features that free tools lack, as well as the ability to apply policies prior to migration. By applying retention policies as part of migration, old and non-relevant email data contained within PSTs can be skipped during the migration -- this can reduce the amount of PST data being re-ingested considerably. It is imperative to delete the PST once it has been migrated if you are truly trying to eliminate these files.  

Once PSTs have been eliminated, the other key step is to ensure that new ones aren't created. This is where archiving capabilities become important. Whether companies are looking at basic archiving tools or more robust solutions the basic goal is the same: provide users with a level of retention and access that duplicates the features they utilized via their risky personal archives.

Companies also need to keep an eye to evolving user requirements. Increased use of mobile devices and BYOD have means email is no longer locked to a desktop -- users will demand access independent of a given device, and that includes access to retained emails. Failing to accommodate these users can create a whole new PST issue, where users turn to “sneakerware” by spooling-out PST files onto removable media to give them the access they need to perform their jobs.

By first eliminating existing PST files, then providing end users with access to their older email information, your exposure to risk will diminish exponentially.

Title image by Donovan van Staden (Shutterstock)