HOT TOPICS: Customer Experience Marketing Automation Social Business SharePoint 2013 Document Management Big Data Mobile DAM

Metrics to Watch When Evaluating Search Performance

2014-08-July-Hide-and-Seek.jpgPeople are dissatisfied with the search applications they are using. This is the result — without exception — from surveys of search performance. Search managers charged with improving satisfaction levels face the problem of defining what "satisfaction" actually means. Search performance has to be evaluated on three criteria: technical performance, retrieval performance and impact performance, but it's impossible to bring all three together in some mathematical formula for "satisfaction."

Technical Performance

Google and Bing have set the expectation that results will be returned in less than a second, but both companies have invested billions of dollars to achieve this. Many corporate websites use a hosted search application which can take much longer. The Shell corporate website currently takes around 10 seconds to respond to a query.

The challenges are much greater internally. Security trimming can impact search response times, as can federated search implementations. There is more to technical performance than minimizing query response times — attention needs to be paid to crawl and indexing speeds and the speed with which the main search index is updated.

Retrieval Performance

Now we are in the area of relevance, recall and precision. Relevance is very subjective. Two users with similar skills and roles and even office locations, may take a very different view on the relevance of a set of results. This may increasingly be the case with mobile search where consumer applications set the expectation that the search will be refined based on location.

Recall is a measure of the percentage of relevant results returned as a percentage of all relevant results in the collection. In most cases there is no way of knowing how many relevant results are in the collection so this measure has significant limitations, especially with very large collections of documents. At least one major law firm has over 1 billion documents. When a user is trying to achieve high recall they will use very high level search terms, such as "drill," and then either use filters and facets to narrow down the search scope or take an entirely different query route based on a review of the initial set of results.

Precision is defined as the percentage of relevant documents presented in a set of results. This is easier to assess and can be extended to the "early precision measure" — the number of relevant results in (say) the first 20 listed. Where there is a requirement for a high degree of precision, the query will often be more complex or make use of high-quality metadata such as Speed 32 ASD as the product name for the drill. The different query approaches between precision and recall also mean that achieving high recall and high precision is not possible.

Some surveys use the term "accurate" as a search performance adjective without defining what the term means. In a recent survey released by Varonis "accuracy" is defined as getting the right results, but what does "right" mean in this context?

The most useless metric is the time taken to complete a search. All too often businesses decide to invest in new technology to reduce the time spent on search and so improve productivity. Defining the start and end times of a search is impossible and getting a list of results is of no value if the bandwidth to the document servers is so poor that working through the first page of results takes for ever.

Impact Performance

In the final assessment the key metric is whether the search application helps the user make a better decision by enabling them to find the needed information. This cannot be assessed through any analytics measure because search may have only played a small, but important role in the decision making process.

A project document found through a search may help a user realize that a team working in another country has found a solution to a problem. Search supports a huge number of different workflows and processes. Searching for people is a very important use of search in larger organizations but is rarely considered when assessing performance.

Impact performance involved conducting user surveys, holding regular meetings with representative groups of users and having good feedback channels to capture both poor and good search performance.

Search Takes Time

Evaluating all three categories of search performance takes time. I have not included search analytics as this is just a way of improving recall and precision. Staff resources need to be appropriate to the scale of the work — to not only conduct these evaluations but also to take action. You can find more about search evaluation in my book on Enterprise Search and the definitive book on search analytics is available from Rosenfeld Media.

Title image by Wasu Watcharadachaphong / Shutterstock.com

About the Author

Martin White is managing director of Intranet Focus, Ltd. and is based in Horsham, UK. An information scientist by profession, he has been involved in information retrieval and search for nearly four decades as a consultant, author and columnist. He is the author of “Enterprise Search” published by O’Reilly Media. He is a Visiting Professor at the Information School, University of Sheffield.

 
 
 
Useful article?
  Email It      

Tags: , , , ,
 
 

Resources

 

Featured Events  View All Events | Add Your Event | feed Events RSS