Federated Search: Great in Theory, Complicated in Practice

A common request clients make during the early stages of an enterprise search project is for a single search box to search across all of their applications. These requests, delivered more in hope than anything else, envision a "silo-breaker search," one that provides a single point of access to all of the information resources in the organization.

In principle, this is an admirable vision, but it is very difficult to deliver in practice. The technology is often referred to as federated search and many vendors claim they can deliver on this requirement. The aim of this column is to provide some insights into what is involved so you can have an informed discussion with your vendor.

Aggregated vs. Federated Search

On the way to federated search, aggregated search is worth a mention. In aggregated search, a search query presents a list of links to content items (usually referred to as information nuggets) that the software assembles into a page for display. Information nuggets can come in multiple formats (text, image, video, etc.) and levels of granularity (document, passage, word, etc.). Each of these formats is defined as a vertical. This is the approach behind the search cards you see in Google and Bing. Selection of the nuggets and the order of presentation takes into account the outcomes of the pages that users clicked after their initial search and is not as easy as it might seem.

The objective of federated search is to query the search applications in multiple collections simultaneously and then present a list of results from each collection which are ranked in order of relevance to the user. Ecommerce comparison sites regularly use of this approach but translating it to enterprise search poses a significant challenge.

Related Article: Unravelling Federated Search

2 Approaches to Federated Search

Query Time Federation

Federated search is available in two distinct architectures. In query time search, the search API issues a query to all search applications, collects the responses and assembles them into a list of results. Issuing the query is fairly easy. The query-to-client interface is usually Open Search. Now the API has to decide how much time it will wait for all of the individual applications to respond to avoid a substantial display latency.

Assuming that (for the purposes of this column) results come back from all of the applications, the next challenge is how to present these results. Although the results may come back from each application in a ranked list, this ranking is based on a specific collection. That makes presenting an integrated list of results in relative rank order a nightmare. At best the results could be presented in multiple windows or interleaved in sequence, but even so the snippets from each search application may well be different.

Index Time Federation

This architecture has a single master index of all of the content in all of the collections. A search is run against the master index, which then produces a ranked list of results against all the documents in the collections. This is usually a substantial number and given the common limit of 10 links a search page, results from some of the applications may not appear anywhere near the top of the ranking list. This can cause problems for the person making the inquiry as when they use the search engine of a specific collection they may end up with some highly relevant results that didn’t make it into the first 100 returned from the master index.

A variation on this architecture compiles the master index from the search indexes of each application rather than from an index of the original content. The index schemas do need to be pretty close for this to be an option. Searching across a mix of classic inverted indexes and knowledge graphs, or on-premises and cloud, is not a good idea.

Start With the User Experience

When considering federated search it is essential to start with the user interface, as both architectures inhibit the quality of the interaction between the search application and the search user. A number of search vendors offer solutions (in my view partial solutions, but make up your own mind) to the challenges of federated search with the inclusion of some clever tweaks and short cuts. Vendors often talk about the number of connectors they have, but federated search isn't just about connector integrity but presenting a user interface that achieves an acceptably high level of search satisfaction.

When you start your discussions with vendors, ask them to demo the user interfaces they have developed for other clients and take a very hard look about how well these might meet the needs of your users across the range of collections you have at present. If they don’t have any, be clear about the development costs. Also ask about what happens when you add a new application either internally or through the acquisition of a business, and what the disaster recovery options are when one of the individual applications goes down. You really want to avoid a complete re-index! Finally, talk about early and late binding security management. This could — and should — be a long meeting.

Learning Opportunities