If you're considering whether your organization should go with cloud-based archiving for some of your documents, take a two step approach in your decision making. First, understand what your archiving requirements are -- whether on premises or in the cloud. Then clarify what the pros and cons are of on premises versus cloud based archiving and decide which approach makes sense for your organization right now.
What is an Archive?
An “archive” is a system that at minimum:
- Securely stores documents; and here I use “documents” as shorthand for user content, including email, docs, social media and web pages
- Retains the documents as long as needed
- Purges documents when they are no longer needed for legal, compliance or business purposes
- Provides authorized users (internal and external) with access to the documents for various purposes (e.g. for business processes, customer service, customer or agent self-service, and discovery)
In service of the above requirements, archives typically include deduplication, indexing and some e-discovery capabilities.
If you look at the way your company and your peer's companies have done archiving in the past, you can see how it has evolved. For unstructured data or content (“documents”) most archiving was historically focused on fixed content like system generated output (such as statements, EOBs, correspondence). Images were also included.
When email became an enterprise concern due to its volume and risk, it was addressed as a content type requiring archiving. Email archiving has not been an unqualified success -- and I tell that story below. More recently -- again because of the entailed volume and risk -- the chaotic swamp of dynamic documents (like Microsoft Office docs), web content and collaboration content started to be archived, along with other forms of e-communications like instant messages.
Which brings us to today, when most large companies are interested in archiving all of the above, plus transactional, unstructured data from business systems, plus rich media like audio and video, plus on occasion entire applications.
A Lesson from the History of Email Archiving
It’s important to understand what you want your archive to do, since there are lots of options out there and you need a good fit.
Let me tell you the story of email archiving to put things in perspective. In the early 2000s a lot of vendors from the ECM space tried to move into the email and related archiving space. How hard could email management be? So they tried to use their general ECM capabilities for archiving and add RM capabilities -- thus providing more features and functions than the less fancy pure play archives were offering. But the ECM vendors couldn’t do the basic blocking and tackling for email archiving. They failed at all four points above:
- They couldn't scale to handle the numbers of users and mailboxes (1)
- They failed to provide reliable, fast access to users who wanted to find and retrieve older emails and attachments (4)
- Some of them “lost” attachments (1, 2 and 4),
- And they failed to provide reliable disposition -- because users defected and squirrelled away emails, not trusting the enterprise archive to do its advertised job (3).
So many organizations dumped their ECM-based archive approaches and went back to the archive specialists, who were able to scale, etc.
Archiving now offers many more options than it did 12 years ago. You can archive everything from social media chats to web pages to movies to old fashioned email and mainframe print streams. You can use the archive for compliance, for active use in complex and demanding business processes, for beyond-the-firewall customer access and participation, and for rigorous e-discovery.
These are all very different scenarios with different requirements. And -- in a nod to this article’s focus -- you can do it in house or via the cloud. So you have to be clear about what you want the archive for.
What Should Your Archive Do?
Start with these key general requirements for archiving. You will weight these according to your situation, and will probably insert additional, more specialized requirements, such as compliance supervision (e.g. for financial services), advanced e-discovery, focus on particular file types (IM, Groupwise, video, web page archiving, salesforce.com), etc. The most important high level requirements for enterprise document archiving are:
- Scalability and Performance
- Accessibility and Availability
- Security and Protection
- Retention and Integrity
Let’s address each briefly in turn.
1. Scalability and performance
The archive should handle the volumes of ingestion within the time windows necessary to provide your business with access to relevant documents when you need them within your business processes. In addition, the archive should provide reasonable response times for document search and retrieval, and the solution should have the ability to perform ingestion and archive functions without negatively impacting overall system performance for users.
2. Accessibility and availability
The archive should provide a mechanism for authorized users to search for and retrieve documents. In addition, the archive should provide the ability for certain external users to retrieve documents, such as e-presentment for customers or agents.
This requirement is very important -- not just for the obvious reasons that you want to get the right information to the right (authorized) persons at the right time -- but because messing this up will sink your hopes for using the archive for defensible disposal. If you don’t provide fast (enough), reliable access to documents, your users will defect and squirrel away their emails, social media objects and other items. And not only will you be unable to implement a defensible purge strategy -- you’ll also have the very difficult challenge of winning the defectors back once you lost them.
3. Security and protection
The archive should have the ability to restrict access to documents, such as for documents that are private, confidential, privileged, secret or essential to business continuity. This may include requirements for encryption of stored content. Some vendors are getting sophisticated about this, providing double blind key architectures, with keys held only by the customer for enhanced data privacy and security in the cloud.
4. Retention and integrity
This is obvious -- but the archive should be able to retain documents for defined periods of time, taking into account legal, regulatory, fiscal, operational and historical requirements. In addition, the archive should provide a suitable guarantee of authenticity. And finally, (if this applies to you) the archive should provide the ability to retain information on an unalterable storage platform when needed (e.g., WORM storage for SEC 17a-4 compliance).