2015-16-April-Purge-Grafitti.jpg

Organizations know they need to reduce storage consumption by purging data. And a new generation of analytics and classification technologies have been riding this wave of concern about unchecked storage growth.

This isn't a case of a made up problem created by software marketers and talking heads: every client I've spoken with in the past three years has cited the exponential growth of storage as a major concern for IT leadership.

But like the public response to climate change, ubiquitous concern for data growth has not led to meaningful action. Why?

A study Doculabs just completed showed three reasons:

  1. Financial: Most organizations lack the necessary data to build a business case
  2. Information Governance: Most organizations do not have an understanding of what they have, where it's stored, who owns it and how long it should be preserved
  3. Political: Most organizations are unable to build concession between the Lines of Business, Legal, Compliance and IT

Each of these points has been made ad nauseam in other blogs, articles and reports. But an organization that wants to purge content needs to pull these three levers. There are interdependencies between the three and a few practical suggestions for how your organization can begin disposing of content. 

The Financial Lever

You've probably seen a cost justification for storage reduction that takes the size of your storage footprint, multiplies that by the cost of storage, and then reduces the number by an estimated savings. For example, if you have 100 gigabytes of storage and you are paying $15 dollars per gigabyte (on average) you are spending $1,500 on storage. If we estimate a 30 percent savings, that gives you a $500 savings.  

One of the main problems with this kind of cost justification is that the purged data is often located across multiple servers, so you are unable to actually decommission storage. IT still has an internal spend for those boxes and now the cost per gigabyte just went up for everyone because storage is just a cost bucket for the lines of business.

Additionally, the financial estimates are sloppy. They don't take into account the differences between the cost of production storage, back-up and disaster recovery, let alone the cost of care and feeding for the machines (staff, electricity, physical space, etc).

An effective business case requires:

  • An accurate total cost of ownership for storage
  • A method of charging the line of business for their use of storage

Most organizations don't have either of these elements. This leaves them with a soft business justification. Organizations with outsourced storage infrastructure have an advantage here -- they get a bill every month with their total fully burdened cost and they can isolate storage cost to the application level (at least for large applications).

These two pieces of information enable them to show the business how much it costs to store everything forever. Best in class firms are charging the cost of storage for those applications back to the business unit that owns them. When a VP's budget is being impacted by storage, that's a lever you can use.

The Information Governance Lever

Information governance is plagued by hype and generalization, so it's still necessary to provide a working definition. We use a variation on Robert Smallwood's definition:

Information governance is the control of information to meet your legal, regulatory and business risk requirements."

Data purging involves several tactical IG concerns and they all fall under the primary category of having a good content inventory. You probably have several of the following:

  • Application inventory
  • Legal hold inventory
  • Records retention schedule

Ideally you want to know what content the applications hold. Then you want to know if any of that content is under legal hold. Finally, you need to know if you can dispose of that content per your retention schedule.

Sounds simple, right? The problem is that your application inventory may or may not give sufficient information on what content it stores. It is likely more focused on other issues like the technical functions, the place in the IT roadmap, who is responsible for the care and feeding of the application, etc. You may have to work with someone in the LOB to get a real understanding of the content the application contains. If you know the cost of storage for each application, you can prioritize this task by cost.

Likewise your records retention schedule doesn't state where content is being stored. The retention schedule's purpose is to align the record class to a retention period based on the regulatory and business requirements of your firm.

A good content map will rationalize the information from these three sources and provide you with the necessary understanding of what content you have, what application houses it, which LOB that application supports, if the content is on legal hold, what its record's classification is, confidentiality status, etc.

Knowing this, you can then show leadership the percentage of content you are over retaining by application / LOB. Saying that 20 percent of your organization's billing data can be disposed of is a much more compelling argument than saying that on average organizations can dispose of X percent of data.

The Political Lever

My colleague James Watson illustrated legal and IT's approach to purging content in this post. IT wants a simple "laminated card" which identifies if content can be deleted or not. Legal's decision making process is more complex and often doesn't lead to a delete or don't delete decision.

This is a case of world's colliding.

There are organizations that have created effective business cases and content inventories, yet still fail to press delete because of conflicting priorities and concerns of Legal, IT, Records Management and LOB. If your General Counsel is conservative and won't sign off on purging data, you aren't going to get very far with this lever. In fact you likely won't be able to press delete on anything. Instead you'll be stuck archiving content and waiting for a regime change.

Depending on where you sit in the organization you can make some progress without all of the above members signing off. Many IT organizations have found short-term success purging content under the guise of IT hygiene. Unauthorized file types can sometimes be disposed of without requesting permission from the business or legal.

However, a long-term project to purge content will require some level of buy-in from many stakeholders. The sooner you begin aligning those interests, the sooner you'll be able to press delete. But keep the story of the laminated card in mind -- these groups often speak very different languages.

Two Parting Clichés

There are two clichés used in the content purging industry. The first is that there is no silver bullet for purging content. And even if you had one silver bullet, you'd still be a couple bullets short. Successful programs need to execute well in several different disciplines (and we haven't even discussed technology needs).

The second cliché: purging content can be like looking for a needle in a haystack. Again a simplification. If we wanted to find the needle in the haystack we could light the haystack on fire. After the hay burnt up, the needle would be there for easy retrieval. A successful content purging program attempts to find and remove several specific pieces of hay in a hay stack.

A small vanguard is making progress in their attempts to purge content, but this is very much an area for growth for most enterprise organizations.

I'm very interested in hearing your thoughts on this topic. Let's keep the discussion going in the comments.

Creative Commons Creative Commons Attribution 2.0 Generic LicenseTitle image by  lundgrenphotography