Search is Getting Smarter All the Time

There is a tendency to be critical of the fact that the core technology dates back to the work of Gerard Salton in the mid-1960s, and that therefore search is broken and is going nowhere. The same could be said of the internal combustion engine so long as you ignore the level of sophistication in Formula 1 and Indy cars.

The reality is that search has never been in a better place in terms of development and there is a lot of exciting stuff sitting well below the horizon because of the disconnect between academic research and enterprise search development.

SIGIR

Probably the best place to get a sense of the rate of progress is to be a member of the Special Interest Group on Information Retrieval of the Association for Computing Machinery (what a lovely name!). Each year the Annual Conference is the place to be to hear about the work going on world-wide on making search smarter. The 36th Conference took place in Dublin, Ireland, in early August and at the time of writing this column the papers can be downloaded from the ACM Digital Library by non-members of the ACM.

In total there were 73 main papers and 85 short papers on all aspects of search enhancement. Grant Ingersoll (Lucidworks) and Daniel Tunkelang (LinkedIn) have both published very good reviews of the main themes of the conference and give a real feel for the diversity and creativity of the research being presented.

The award for best paper at the conference went to Ryen White (no relation) from Microsoft on Beliefs and Biases in Web Search, and if you do nothing else today, click on the link and download the paper. His research indicates that people seek to confirm their beliefs with their searches and that search engines provide positively-skewed search results, irrespective of the truth. The 2014 Conference will be held on the Gold Coast, Australia.

Google Research Awards

Also in August Google announced the recipients of its Research Awards. Google has a biannual open call for proposals on computer science-related topics including machine learning and structured data, policy, human computer interaction and geo/maps. The grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google scientists and engineers. In this round of awards 105 projects were funded.

The effort that Google puts in to publishing the outcomes of the awards process is minimal. A pdf listing the researchers and their institutions is all that is provided, which really is not good enough. Links to the institutions would have been at least a good start. The awards are very broad in terms of subject area but a substantial number are in information retrieval, natural language processing, mobile and human-computer interaction.

Books and monographs

Another indicator of the scale of research into search can be found in the wide range of books and monographs that are now being published, notably by Morgan and Claypool, Now Publishers and Springer. There are perhaps 20 new titles published each year between these three publishers alone. Take a look at the latest monograph from Morgan and Claypool, which is an overview of information retrieval models by Thomas Roelleke. If you want to understand how search works, then this monograph is a very good place to start.

Search is just applied mathematics

The power of search is not in the technology but in the way in which it can be used to implement mathematical algorithms that offer new ways of querying search indexes. This is perhaps an over-statement, but not by much.

If you read any of the standard texts on information retrieval you will quickly realize that search is at the interface of applied mathematics and computational linguistics, with a good dose of Markov chains and related models. As a search manager there is no requirement to understand how the mathematics works but there is a requirement to be aware that search is on the move in terms of novel approaches to solving the problems created by the quite amazing growth we are seeing in both structured and unstructured repositories.

One of the attractions of open-source search is that it may reduce the time between the conclusion of a research project and the arrival of the code in an open-source application. That is not to say that commercial search applications will be standing still. IBM and Microsoft are making a considerable commitment to research into information retrieval. The challenge for search managers will be tracking what is going on and taking a view on which of these developments is going to make search better for their organization.

Title image courtesy of Guillermo Pis Gonzalez (Shutterstock)

Editor's Note: For more of Martin's thoughts on the state of Enterprise Search, read Is There a Future for Enterprise Search?