This is quite interesting. Xerox has just announced that they've come up with new classification software that is able to process electronic documents, automatically classify them, then intelligently route them for workflow, response, archival, etc. It is capable of learning new classifications on the fly and supports up to 20 languages.
Some excerpts from the Techweb.com article follow....
The Xerox tool, said Eric Gaussier, a researcher at the Grenoble facility, uses a hierarchical model able to understand the dependency between multiple categories, unlike so-called “flat” search and retrieval tools which treat each category separately.
Xerox's new software, written in Java, and suitable for deploying on Unix, Linux, and Windows, is the result of four years of steady work in linguistic modeling, semantics, and machine learning, said Gaussier.
It can be used “out of the box” by adding it to existing document management applications created by an enterprise, he added. In that approach, “with a set of categories already established, the software take documents already categorized and using our models, 'learns' how to automatically classify new documents”
...the technology is bright enough to learn new categories on its own as it comes across additional documents.
...Able to handle documents written in up to 20 different languages, it also serves as an automatic router, shunting categorized documents to the right person -- via e-mail attachments, for instance... Related articles