woman looking through small pair of binoculars
New enterprise search vendors, unencumbered by legacy challenges, are releasing new search techniques at an unprecedented rate PHOTO: Chase Elliott Clark

Search doesn't change overnight. 

The core elements of search technology were developed between 1955 and 1985, but some have only seen the light of day in the last few years. It takes a long time for research to translate to practice. 

For example, the BM25 ranking model that built on tf-idf was developed in the 1980s and yet only appeared in Lucene and Microsoft SharePoint in the last few years. 

A number of reasons explain the slow adoption of new search techniques. 

One is the lack of test collections of enterprise scale and diversity, placing a big challenge for all testing requirements. Vendors also need to ensure legacy clients aren't forced to rebuild indexes to many millions of documents as a result of rolling out a new approach, which is why Microsoft SharePoint often errs on the side of caution in using its own research. 

However, the current pace of development is faster than ever before, in part because new entrants do not have the legacy index management problem. In the last few years we've started seeing signs of the next generation of search.

3 Signs of the Next Generation of Search

Joining Text and Database Content

A substantial amount of research and product releases have focused on how to join a search across both unstructured and structured content. Attivio has gone deep into this topic, with its composite join approach (read about it in US Patent 9275155). 

Google is also looking at this issue, with dozens of patents filed. Because text retrieval is based on probabilities, probabilistic relational database models may be required — that's one challenge. Security management is another challenge, because the model used for database security may not be a one-to-one match with document access. 

With search having the potential to provide the integration platform for a digital workplace, vendors see this as an opportunity to sell far more than text retrieval.

Search Cards

Search cards respond to queries with an integrated view of information on the topic. A search for a company on Google will result in a concise display of brands, employee numbers and financial information. Google owns a number of patents on entity detection and extraction for cards (start with US 20160034471 A1 if you're curious). 

The initial challenge is how to recognize an entity and place it in a context. The concept of a ‘named entity’ dates back to the 1990s and was initially addressed with rules-based approaches and matching to dictionaries. Now machine learning is playing an important role in recognizing the context of an entity. If you look at your query logs you will find a significant proportion of the queries are entity-related.

With new solutions to entity identification available, vendors are turning their attention to creating dynamic search cards which present information in a summarized form. This would enable you to not only look at the key elements of a project or a client relationship but also see machine summaries of project summaries and customer call records. 

To an extent, this is an example of a well established requirement supported by existing algorithms. Only more recently has the computing power arrived to deliver results in the 500 milliseconds or so time frame users expect.

Linked Enterprise Data

Linked enterprise data (LED) attempts to bring the concepts of the semantic web into the enterprise environment. In short, LED pulls together data from across an enterprise into a common information pool from which users can access it, regardless of the source application. LED also brings external data related to any inquiry into the common information pool. 

This is only a brief overview of LED, Antidot offers a very good introduction without too much vendor spin here. (Note: the paper dates from 2012, yet another example of how long it takes to bring these technologies into the market.) 

Antidot's Tarqua software is an excellent example of how new entrants are taking new technologies and providing potentially very powerful applications. Anzo from Cambridge Semantic Technologies, Semaphore from SmartLogic and iManage Insight (from its acquisition of RAVN) are also all worth a look. Many other solutions are available, with some acting as modules than can be integrated with search applications, especially open source applications.

Search's Future Is Very Bright

SharePoint dominates the market for search — no one would dispute that. But the sheer scale of its customer base inhibits innovation. And yes, Microsoft is offering Delve and other tools, but their integration into SharePoint and Office 365 is far from elegant. 

Search managers need to look at what is happening in the three topics above and consider how these developments could transform the way employees gain access to information, expertise and knowledge. They represent just a few of the elements of next generation enterprise search.