The emergence of Google and other web search engines has made search pervasive. It has created a generation of users that expect to locate information at work as easily as they can at home online. But, often that is not the reality. Implement a tool to find stuff. How difficult could that possibly be? Ask anyone that has ever implemented anything but the simplest search solution, and they’ll likely agree -- pretty freaking hard.

Here are a few things to avoid to not make your enterprise search implementation even harder.

Growing Importance of Enterprise Search and Open Source

Although the scale of web search is larger, enterprise search can be more complex. When compared to web search, enterprise search has more diverse data sources, must take security into account and is frequently required to accommodate advanced search features. In addition to functional complexity, reduced corporate budgets have also increased the challenge of implementing holistic enterprise search solutions driving many organizations to look beyond commercial vendors to open source.

One open source solution, Apache Lucene and Solr, seems to be excelling above the others and competing strongly with commercial products from giants like IBM and Autonomy. Apache Lucene and Solr has installations at over 4,000 companies worldwide, ranks among the top 15 open source projects and is one of the top 5 at Apache.

CMSWire spoke with Lucene/Solr expert Jay Hill of Lucid Imagination for a few tips on things to avoid when implementing Lucene/Solr to reduce the risk of your search project biting the dust. Hill calls them the “Seven Deadly Sins of Solr” – sloth, greed, pride, lust, envy, gluttony and wrath.

1. Sloth

According to Hill, sloth is one of the worst things you can do -- or rather not do -- when implementing Lucene/Solr. Sloth, or laziness, is about failing to make the effort necessary to ensure an implementation is successful. Sloth can come in many forms:

  • Failing to understand the features, strengths and weakness of Lucene/Slor
  • Not tuning the application, Java Virtual Machine or server
  • Not taking time to understand user needs
  • Relying exclusively on consultants to implement the solution

Adopting enterprise search is complicated, and sloth will cause your project to fail.

2. Greed

Greed is another common mistake organizations make when implementing Lucene/Solr. Many assume that because Lucene/Solr is free open source, it will cost nothing to implement. The solution costs less than comparable commercial enterprise search options, but implementation is not free. Hill said, he has seen many companies try to implement Lucene/Solr “on the cheap” at the expense of their entire project. If you want to implement enterprise search you should be prepared to invest in items like an adequate number of servers, professional services for implementation guidance and training.

3. Pride

Pride can be a good thing, but take it too far and it becomes arrogance, which is rarely a positive trait. That is definitely true when it comes to implementing Lucene/Solr.  Excessive pride is indicated by actions like failing to get help when you encounter a problem, assuming you know the business requirements instead of soliciting them from the users or custom coding features/solving problems that others have already addressed. Pride can result in excess expense, time and effort to implement Lucene/Solr.

To avoid your project being negatively impacted by pride:

  • Ask for help or use resources like the user and developer mailing list to find solutions to implementation, management or design issues
  • Let the users define project needs instead of building in too few or too many features based on your assumptions
  • Participate in the Lucene/Solr community and respect the experience of others
  • Admit to yourself that although your project has unique features, other projects have similarities. Don’t be afraid to leverage the knowledge that others have learned from their implementations.

4. Lust

Most technologists have experienced techno lust at one point in their career. However, you have to keep in mind that the bleeding edge is called the “bleeding” edge for a reason. You will eventually experience pain if you lust after new feature and implement it prematurely or without adequate justification. Other examples of lust in Lucene/Solr implementations include:

  • Jumping into an overly complex infrastructure before understanding your actual query volume
  • Trying to do too much at once. It is often better to pursue iterative approach to implementation so that you can adjust as you learn lessons
  • Obsessing over minute details before dealing with fundamental project issues
  • Trying to push the envelope just because it’s cool

Lust introduces unnecessary risk into your project, which could easily extend budgets and timelines, or worse, cause the project to fail. When implementing Lucene/Solr, or any other enterprise search solution, it's better to use an agile approach, do adequate due diligence before attempting cutting edge unorthodox approaches and focus on core requirements, design and implementation.

5. Envy

Envy is similar to lust and has many of the same ramifications. Envy occurs when you implement “cool” features just because your read about them or saw them implemented elsewhere. Even if the feature is stable, if it is not required, you are unnecessarily increasing project complexity. Every extra feature or line of code has a multiplicative impact since it must ultimately be implemented, maintained and tested. Avoid envy or the green eyed monster may make your project get ugly.  

6. Gluttony

Gluttony occurs when you allow elements like configuration files, queries or other elements balloon out of control. It is important to consistently review your implementation and look for opportunities to keep it “lean and mean”. Refactoring should be a regular part of your development and maintenance cycle. Although these things seem like low level developer centric details, the impact can be large and apparent all the way to the end user. Bloated implementations increase the risk of errors and the likelihood that your searches aren’t optimized for the best performance.

7. Wrath

The final Lucene/Solr sin is wrath. Although most people associate wrath with anger, it also defined as,
“A vehement denial of the truth, both to others and in the form of self-denial and impatience.”

We all know people who believe that they can do no wrong; if you don’t know somebody, then unfortunately it may be you. Wrath occurs when you do things like:

  • Ignore feedback and complaints
  • Don’t bother to look at log files or performance results
  • Disregard what the users want in favor of how you think the project should be implemented

Wrath is a clear recipe for disaster and unhappy stakeholders.

There are many additional mistakes that you could make when implementing enterprise search with Lucene/Solr, but these seven are a good primer on things to avoid. Have you implemented enterprise search? Did you have any lessons learned? We would love to hear from you.

Editor's Note: You might also be interested in Stephanie Lemieux's Open-Source Options for Faceted Search, for the Budget Conscious User.