Lucene Finds its Way to the Top
The Apache Software Foundation (ASF) has recently reclassified the Lucene search engine project from a Jakarta sub-project to a top-level ASF effort.
Lucene is a full text search engine that provides an API and a set of libraries enabling powerful search functionality to be included in all types of Java applications. Doug Cutting is the project's primary developer.
Lucene is offered as a developer toolkit, and requires a certain amount of Java development to implement or integrate a functional search solution.
As an example, for web search, a developer would need to write their own web site spider that populated the Lucene index with Lucene documents.
On the retrieval side, the developer would then need to provide a form handler and query parser that called into the Lucene API for search hits and formatted the results for web presentation.
SPONSORSHIP
CMSWire speaks to a specific
audience of professionals. You can too.
Learn more.
Given this, its best to think of Lucene as a developer resource and not as a ready to run search engine.
There are several ports of Lucene to other languages. Of note are DotLucene (C# .NET) and Plucene (PERL).
Plucene is currently used by Technorati, is embedded in the Eclipse IDE, and is part of www.furl.com's tools.
Just Published
- iPublishCentral Web Publishing Attracts Publishers
- Kentico Web CMS Offers Social Networking Edition
- KMWorld: SpringCM Makes Good on Its Privia Promise
- FileRide: Social Networking or Invasive Indexing?
- SilverStripe Moves Ahead With Version 2.3
- OnlyWire, the Automated Bookmarking System that Almost Died
- Fatwire Content Server 7.5: Web CMS Site Preview Feature is a Must Have
- Microblogging Communities, The Merging of Microblogs and Social Networks
- Open Source .Net CMS DotNetNuke Gets a New CEO
- The Customer is in Charge
2 Reader Comments
Leave a Response
From the Job Board (View All Jobs
|
Jobs Feed
| Post a Job)
- Ruby Software Engineer at iList, Inc
- Web UI Developer at EMI Digital Music
- Director of Developer Community at The Echo Nest
- User Interface Engineer at Facebook
- Full Time Writer at TechCrunch
- Senior Technical Consultant at Acquia
- Director of Technology with Drupal Experience at Imagination
- Software Developer with Drupal Experience at Center for History and New Media



Email a Friend
Digg It

It's Lucene, not PLucene that's used by Technorati. The developer does not have to provide a query parser, Lucene has a good default query parser.
Thanks for the correction.
You are right that there is a query parser as part of Lucene and Plucene. However, there are some very common query sytax expressions that will cause problems with that parser. I strong doubt one would put the "stock" parser into production.
I my experience, its a much more common practice to implement an intermediary parser that handles more syntax cases and one that is often tuned to what the given audience needs/expects.
-Brice