Anyone who has released an enterprise software application knows how important the final step before release is — User Acceptance Testing (UAT).

A huge amount of guidance is available on how to undertake these tests. Usually they require users to work through test scripts for each of the tasks or functions supported by the application. Software tools, such as HP’s Quality Center, also can aid in managing large-scale UAT tests. And from the outset, the technical team, the business teams and the stakeholders needs to agree on what the threshold is for acceptance before releasing the application for use.

None of this applies for enterprise search applications.

A number of important differences separate enterprise search applications — which are a classic example of a "wicked problem" — from other enterprise applications:

  1. No workflow process exists from which a set of tasks can be developed for test purposes. 
  2. Everyone in the organization is a user, each with their own views of what constitutes ‘good search.’ 
  3. There is no ‘correct’ route to a defined piece of information because people will use filters and facets in different combinations only to end up at close to the same place. That place may be reasonably easy to define in terms of searching for a specific document, but not when the search application is being used in exploratory search mode, perhaps to meet a learning requirement.

Content Quality

Enterprise applications usually offer a level of assurance around the validity of data. Either someone added it manually or through a connector using a set of validation rules, especially around dates, people and organizations. Data validity checks are commonly undertaken early on in a project, so test routines usually remove this one variable. 

In the case of search content, quality can range from terrible to acceptable. The extent of acceptability depends on the experience of the user and how they plan to use the content. That assumes that the search application is able to find enough elements to determine that the content notionally meets the requirements of the query term. 

Another issue: users may not notice the quality (or lack thereof) until some time after the search was performed, when they've already shared the information with others who are skeptical of the quality. 

Is that a Pass or a Fail for the test?

Test Scripts

Using test scripts for self-testing has substantial limitations, especially when the search application fails to meet user expectations. These never turn out to be the same as user requirements, which will prompt the user to make notes. 

This approach can fail in two ways: if the user make notes as they go along, they may lose track of where they are and use a sub-optimal approach. 

Alternatively, if they wait to write their notes until after a failed search attempt, they will almost certainly get some of the details wrong. Remembering exactly what they did, in what sequence and what reactions queries and commands provoked is challenging. 

There is no substitute for one-on-one usability testing for assessing the quality of the search dialogue.

Starting on the Right Foot

Have enterprise search application test procedures in place and validated at the very outset of the installation, assess and refine them during the implementation, and then continually reassess them throughout the operational life of the application. 

This will involve a wide range of tests, including heuristic and expert walk-throughs, the use of collections of test documents, the review of search logs, usability tests and qualitative feedback from a wide range of users. Users will also need to be defined in terms of personas, user cases, search expertise and much more. 

With so much testing needed, there is a good justification for having someone on the project team who does nothing but design and manage the testing, and who takes over the operational role after the application launch. 

Applications such as Quepid, which support A/B testing, can make a substantial impact on search relevance management. 

Federated search makes all of the above much more complicated — not only does the federation have to work but the search experience has to be noticeably better.

Define ‘User Acceptance’

Unfortunately, there's no one metric of user acceptance to help decide Go or No Go for implementation.

This decision will only be reached by collating all of the evidence from the tests mentioned above on a regular basis and constructing trend lines for both qualitative and quantitative testing. Stakeholders will reach a point when the delta improvements (say on a monthly basis) are such that they can sign off and transfer management to the business search team. 

Forecasting when they'll reach that point is impossible at the outset. But if IT won’t let it go without "a proper UAT," ask them to define the acceptance criteria!