CMS News, Reviews and Resources

Content Management Matters ™

Home > Archives > Search
 Are you hiring? Target top talent on our CM Job Board.



Lucene Search Gets Some New Functionality

By Barb Mosher
May 22. 2008

Teragram Provides Linguistic Tools for Apache Lucene Open Source Search Engine

Apache’s Lucene search engine has just picked up some multilingual, natural language support with the recent integration of Teragram Linguistic Tools.

This integration gives Lucene the capabilities to compete with some of the bigger search engines on the market today.

Who is Teragram?

Teragram is a vendor of mobile and multilingual natural language processing technologies. They have a whole suite of tools designed to improve the speed, accuracy and global language support of search in an organization. And they support some big ones: Ask.com, Associated Press, CNN, Factiva, Ebay and Forbes.com to name a few.

Teragram, who was recently acquired from BI vendor SAS, made the list of “100 Companies That Matter in Knowledge Management” by KMWorld magazine. This honor was based on their suite of linguistic services which includes Automatic Categorization and Taxonomy Manager.

SPONSORSHIP

CMSWire speaks to a specific audience of professionals and opinion makers focused on content management, publishing and collaboration.
Advertise here.

What’s Lucene Getting from Teragram?

You likely already know a lot of Lucene, Apache’s free open source full-featured text search engine. It powers sites like Wikipedia and CNET Reviews.

With the integration comes a few nice enhancements to the open source search engine including the ability to:

  • Add taxonomies and faceted search
  • Correct the spelling of queries
  • Search in multiple languages

Faceted Search

Teragram’s suite includes TK240 taxonomy management, automatic categorization and automatic metadata generation. These things enable the creation of a searchable, faceted index.

Multi-lingual Support

Lucene now has multilingual natural language processing. As a result they can provide things like morphological stemming, spelling correction, parts-of-speech tagging and related queries. There are also a dictionaries for language such as Eastern and Western Europe, Asia and Middle East.

All this new functionality is good news for Apache Lucene. These new capabilities should propel it a long way towards competing with the proprietary enterprise search solutions. The enhancements should also make some open source CMS vendors happy (Alfresco, Movable Type) as they provide support for Lucene with their products and can now give their customers a better search engine.

Was this article useful?

Comments

Add a Comment

Email:
Web Site:
Comments:
Security Code:
  Remember me?
  


Advertise on CMSWire





Add to Technorati Favorites