The recent acquisition of document capture vendor DataCap by IBM and the filing of revenue figures by Kofax that were 15% higher than the previous year (in the middle of a recession) has put document and data capture in the spotlight.

We might say “in the light again” as document capture has always been a core component of enterprise content management even if it doesn’t get as much attention as the all-bells-and-whistles tools that are at the heart of any enterprise's information strategy.

There are many possible reasons for this, not least of which is that when changes happen to document capture software it tends to be by very small increments and pass unnoticed by all except those working directly with the software.

Enterprise CMS and Document Capture

However, the days of just scanning documents and shoving them into an Enterprise CMS are gone (or at least should be!) and to transform your paper content into usable content needs to be thought through.

It is as important to ‘strategize’ your document capture deployments just as much as your ECM deployments, after all what good is a content management system if you can’t get content into it in the first place.

AIIM_Doc Capture_Motivators.jpg

Why enterprises deploy document capture. From Distributed Capture Drives Next-Generation Imaging Processes, AIIM, March 2010

The reasons for this lie in the nature of document capture itself. Just to be clear about this: Document Capture Software refers to applications that can scan paper content, digitize it and, in enterprises, send it into a document management system or an Enterprise CMS.

At this stage of development, most document capture software can deal with any number of image formats including JPGs, PDFs, TIFFs and BMPs. There are also a number of features now that would be considered standard across the industry. These include:

  • Barcode Recognition
  • Patch Code Recognition
  • Document separation
  • Optical Character Recognition, which is the mechanical or electronic translation of scanned images or printed text into machine-encoded text
  • Optical Mark Recognition (capturing human-marked data from document forms such as surveys and tests
  • Indexing
  • Document content migration

Just a word of warning here. Make sure, if you are looking at document capture software, you do not buy data capture software instead. While they share the goal of taking content into the enterprise infrastructure, they are quite different.

Data capture software refers to the myriad of solutions that can crawl across enterprise’s IT infrastructure and gather both structured and unstructured digital content from documents, or information about the documents themselves (metadata) and place that in an Enterprise CMS.

Centralized v Distributed Capture

So now that you know what some of the features of document capture are you need to decide what kind of scanning is going to suit your enterprise. There are two kinds of deployments that you need to consider. 

Centralized Capture

This is where all the documents that need to be captured are brought to a central location, scanned and sent into the Enterprise CMS.

For large companies with many different enterprise locations, this can mean transporting the documents to a central location and then scanning them, which adds the risk of losing data to the list of other threats to your data.

It can also mean long delays in getting the information into the enterprise as in situations where there is large number of enterprise locations, there will be a wait before the data can be captured.

However, some enterprises see the principal advantage of this being the added control and security that can be applied if document content is only entering from a single point.

Distributed Capture

Distributed capture is just that. Instead of documents being captured in a single location, they are captured wherever there is an enterprise branch. The information is sent to a central location after it is digitized where it can be applied to business processes or archived.

The early days of distributed scanning involved client/server technology with the documents entered at the client end and sent to a centralized server. For large enterprises this was generally expensive to set up and very difficult to manage given the number of locations involved.

However, with the continued development of web-based capture technologies distributed capture has become much easier and we should see it being deployed on a wider scale in the future.

AIIM_Doc Capture_Resistance.jpg

Enterprise concerns about Document Capture. From Distributed Capture Drives Next-Generation Imaging Processes, AIIM, March 2010

Document Capture Considerations

With this in mind there are a number of things that enterprises need to consider or do before deciding on how document capture software should be deployed.

1. Capture Strategy

If an enterprise is going to scan documents into their Enterprise CMS without deciding how the information should be organized, it is almost not worth bothering in the first place.

Without the development of a clear capture strategy, including taxonomies and all the different devices that are being used to capture documents, applied across the entire enterprise -- whether a single or multi-location enterprise --  crucial information will be lost.

2. Capture Automation

With most current software it is possible to automate capture tasks. If you can, you should automate as much as possible. Apart from avoiding inevitable mistakes that come with manual entry, it will also accelerate business processes producing end results more quickly.

With automation, documents enter the system immediately and by-pass traditional, clumsy manual processes.

3. Legacy systems

Document capture software doesn’t work in a vacuum and needs to be compatible with the legacy systems that are already installed in an enterprise. Before choosing document capture software make sure that it is compatible with the systems you have and if not how much money and disruption will it take to make them work together.

4. Storage conventions

Create a set of capture ‘rules’ that outline what documents can be taken into the system, how captured documents will be named and how the data is formatted. This will avoid the loss of information captured from documents, which can disappear into unsearchable sites if rigid controls aren’t set in place.

5. Ease-of-Use

Before the deployment of document capture software, especially with distributed scanning, where the people assigned to manage captured documents are not on site, enterprises should make sure that the system is easy enough to use by users who would not have document capture as their main task, but who also carry out the business tasks required by the enterprise.

With the pressure on companies now to cut down on paper use and the need to keep documents -- both paper and digital -- for compliance reasons, the move to document capture software is just about inevitable for companies that haven’t already deployed it.

However, like all other elements of content management planning, a clear strategy is essential to develop before anything else. With document capture, unplanned and disorganized capture will add to one of the prinicpal things companies hope to tackle with the deployment of such software, information mismanagement.