CMS News, Reviews and Resources
Content Management Matters ™
 
 
Add to Technorati Favorites

Lucene Finds its Way to the Top

By Brice Dunwoodie on Feb 17. 2005

The Apache Software Foundation (ASF) has recently reclassified the Lucene search engine project from a Jakarta sub-project to a top-level ASF effort.

Lucene is a full text search engine that provides an API and a set of libraries enabling powerful search functionality to be included in all types of Java applications. Doug Cutting is the project's primary developer.

Lucene is offered as a developer toolkit, and requires a certain amount of Java development to implement or integrate a functional search solution.

As an example, for web search, a developer would need to write their own web site spider that populated the Lucene index with Lucene documents.

On the retrieval side, the developer would then need to provide a form handler and query parser that called into the Lucene API for search hits and formatted the results for web presentation.

SPONSORSHIP

CMSWire speaks to a specific audience of professionals. You can too.
Learn more.

Given this, its best to think of Lucene as a developer resource and not as a ready to run search engine.

There are several ports of Lucene to other languages. Of note are DotLucene (C# .NET) and Plucene (PERL).

Plucene is currently used by Technorati, is embedded in the Eclipse IDE, and is part of www.furl.com's tools.


Was this article useful?

Just Published

 
 

2 Reader Comments

1 | anon — April 6, 2005 4:14 PM

It's Lucene, not PLucene that's used by Technorati. The developer does not have to provide a query parser, Lucene has a good default query parser.

2 | Brice Dunwoodie — April 23, 2005 1:48 AM

Thanks for the correction.

You are right that there is a query parser as part of Lucene and Plucene. However, there are some very common query sytax expressions that will cause problems with that parser. I strong doubt one would put the "stock" parser into production.

I my experience, its a much more common practice to implement an intermediary parser that handles more syntax cases and one that is often tuned to what the given audience needs/expects.

-Brice

Leave a Response

  Remember me?