If you have ever wondered how all the information in those tiny boxes on official forms eventually makes their way into a database, the answer is a combination of a high-quality scanner, and a top-of-the-line optical character recognition (OCR) software package.OmniExtract 6.1.4, from Delhi-based Newgen, is a solution to the problem of how to process and automatically extract the data contained within the scanned images of both structured and unstructured forms. In other words, if you are looking for automatic metadata entry when a form is scanned, then look no further.
The updated version of OmniExtract is constructed with a open architecture that allows more flexibility for the integration of customized external applications for image pre-processing. Furthermore, the primary selling point for the new version is the capability to consume complex structured forms - think mortgage applications - without the need to redesign the form.
In other words, OmniExtract purports to be able to adapt to your organization's business processes instead of forcing your organization to change its ways.
OmniExtract supports the ability the automatically identify and create data zones thereby improving form definition. Moreover, these data zones can then be associated with multiple legends for a better success rate when consuming forms. Along with these features, OmniExtract 6.1.4 also includes the following capabilities: automatic table extraction, automatic form definition, and static text removal.
From an architectural perspective, OmniExtract leverages a Microsoft
Windows Server based modular architecture consisting of the following:
* Extraction Server
: An independent process, meaning no humans required, that supports volumes of forms in the millions - according to NewGen. The server process is queue-based and continually picks up forms and processes them based on their definitions. Multiple third party OCR and Intelligent Character Recognition (ICR) systems can be integrated.
* Extraction Manager
: A server-side administration tool for monitoring and controlling batches submitted for extraction.
* Form Definition
: A complete form-template definition component that supports defining data zones along with testing and verification of templates.
* Scan Station
: This module allows forms to be organized into batches
using neural network based form identification and segregation features.
* Verification Station
: Extracted data is verified and corrected automatically with intelligent notifications pre-configured for doubtful characters.
If your organization processes a large amount of paper-based forms and you want to eliminate the data entry required on top of the resources required for scanning, take a look at Newgen's latest offering
. According to Newgen Software, the OmniExtract product has been launched in Europe, the Middle East, Africa, and countries in the Asian Pacific region. A release in the United States is coming soon.