People are dissatisfied with the search applications they are using. This is the result -- without exception -- from surveys of search performance. Search managers charged with improving satisfaction levels face the problem of defining what "satisfaction" actually means. Search performance has to be evaluated on three criteria: technical performance, retrieval performance and impact performance, but it's impossible to bring all three together in some mathematical formula for "satisfaction."
Google and Bing have set the expectation that results will be returned in less than a second, but both companies have invested billions of dollars to achieve this. Many corporate websites use a hosted search application which can take much longer. The Shell corporate website currently takes around 10 seconds to respond to a query.
The challenges are much greater internally. Security trimming can impact search response times, as can federated search implementations. There is more to technical performance than minimizing query response times -- attention needs to be paid to crawl and indexing speeds and the speed with which the main search index is updated.
Now we are in the area of relevance, recall and precision. Relevance is very subjective. Two users with similar skills and roles and even office locations, may take a very different view on the relevance of a set of results. This may increasingly be the case with mobile search where consumer applications set the expectation that the search will be refined based on location.
Recall is a measure of the percentage of relevant results returned as a percentage of all relevant results in the collection. In most cases there is no way of knowing how many relevant results are in the collection so this measure has significant limitations, especially with very large collections of documents. At least one major law firm has over 1 billion documents. When a user is trying to achieve high recall they will use very high level search terms, such as "drill," and then either use filters and facets to narrow down the search scope or take an entirely different query route based on a review of the initial set of results.
Precision is defined as the percentage of relevant documents presented in a set of results. This is easier to assess and can be extended to the "early precision measure" -- the number of relevant results in (say) the first 20 listed. Where there is a requirement for a high degree of precision, the query will often be more complex or make use of high-quality metadata such as Speed 32 ASD as the product name for the drill. The different query approaches between precision and recall also mean that achieving high recall and high precision is not possible.
Some surveys use the term "accurate" as a search performance adjective without defining what the term means. In a recent survey released by Varonis "accuracy" is defined as getting the right results, but what does "right" mean in this context?
The most useless metric is the time taken to complete a search. All too often businesses decide to invest in new technology to reduce the time spent on search and so improve productivity. Defining the start and end times of a search is impossible and getting a list of results is of no value if the bandwidth to the document servers is so poor that working through the first page of results takes for ever.
In the final assessment the key metric is whether the search application helps the user make a better decision by enabling them to find the needed information. This cannot be assessed through any analytics measure because search may have only played a small, but important role in the decision making process.
A project document found through a search may help a user realize that a team working in another country has found a solution to a problem. Search supports a huge number of different workflows and processes. Searching for people is a very important use of search in larger organizations but is rarely considered when assessing performance.
Impact performance involved conducting user surveys, holding regular meetings with representative groups of users and having good feedback channels to capture both poor and good search performance.
Search Takes Time
Evaluating all three categories of search performance takes time. I have not included search analytics as this is just a way of improving recall and precision. Staff resources need to be appropriate to the scale of the work -- to not only conduct these evaluations but also to take action. You can find more about search evaluation in my book on Enterprise Search and the definitive book on search analytics is available from Rosenfeld Media.