Apache's Lucene search engine has just picked up some multilingual, natural language support with the recent integration of Teragram Linguistic Tools.
This integration gives Lucene the capabilities to compete with some of the bigger search engines on the market today.
Who is Teragram?
Teragram is a vendor of mobile and multilingual natural language processing technologies. They have a whole suite of tools designed to improve the speed, accuracy and global language support of search
in an organization. And they support some big ones: Ask.com, Associated Press, CNN, Factiva, Ebay and Forbes.com to name a few.
Teragram, who was recently acquired from BI vendor SAS, made the list of “100 Companies That Matter in Knowledge Management” by KMWorld magazine. This honor was based on their suite of linguistic services which includes Automatic Categorization and Taxonomy Manager.
What's Lucene Getting from Teragram?
You likely already know a lot of Lucene
, Apache's free open source full-featured text search engine. It powers sites like Wikipedia and CNET Reviews.
With the integration comes a few nice enhancements to the open source search engine including the ability to:
* Add taxonomies and faceted search
* Correct the spelling of queries
* Search in multiple languages Faceted Search
Teragram's suite includes TK240 taxonomy management, automatic categorization and automatic metadata generation. These things enable the creation of a searchable, faceted index. Multi-lingual Support
Lucene now has multilingual natural language processing. As a result they can provide things like morphological stemming, spelling correction, parts-of-speech tagging and related queries. There are also a dictionaries for language such as Eastern and Western Europe, Asia and Middle East.
All this new functionality is good news for Apache Lucene. These new capabilities should propel it a long way towards competing with the proprietary enterprise search solutions. The enhancements should also make some open source CMS
vendors happy (Alfresco
, Movable Type
) as they provide support for Lucene with their products and can now give their customers a better search engine.