People collect all sorts of things; paperweights, Chinese porcelain, paintings of cats. I am a thematic information collector.
The bookshelves in my house hold collections of books on specific topics, including the music of J.S.Bach, the decryption of the Enigma, Tunny and Purple codes and American political history. The bookshelves in my office tell a different story, dominated by collaboration, virtual teams and a hoard of books on information retrieval.
My first encounter with enterprise search was working with Unilever Computer Services in 1980 to 1981 to develop the DECO search application. Since that time I have assiduously collected everything I could find about the technology of search and its applications, a story that dates back into the 1950s.
Needles In Haystacks
Ironically though, information on search is very dispersed. Readers of this column will know that I frequently highlight the lack of transfer of knowledge between the information science and enterprise search communities.
If you walk along the computer section of a large bookshop you will find books on many different subjects, but not on search. Amazon promises over 20,000 books on information retrieval but by page six is offering me Higher Nature Advance Brain Nutrients — Pack of 90 Capsules. Perhaps Amazon knows me better than I thought!
The reality is a substantial amount of information is available on the Internet about all aspects of search and information retrieval — but it is not easy to find. Searching for [search] in Google is not a solution. Information on search applied to enterprise collections is even more dispersed. For example, David Hawking contributed a superb chapter on Enterprise Search in the second edition of Modern Information Retrieval, but it is just 40 pages in a book of almost 900. I suspect few are aware of it (and be careful if you search for this book, as many of the citations on Google and Bing are to the original 1999 version).
This is especially true when you search for web resources. It may come as a surprise to learn that there are at least 40 blogs on enterprise search. Tracking down search vendors is not easy either. I have maintained a list on the Intranet Focus website for some time but it has had very few hits, probably because it is not very visible amongst all the intranet content. There are some good collections of links to search resources, notably from the ACM SIGIR and Microsoft.
Enterprise Search Information Portal
In the process of writing the second edition of Enterprise Search, I accumulated a substantial amount of information that did not make it into the book. The list of documents in my research file has now reached over 800. Less than 10 percent of these appear at the end of the chapters in the book. So to help me, and to help me help you, there is now a website for the book called (not surprisingly) EnterpriseSearchBook.
Know that this is not an official O’Reilly website, and you will not find sections of the book on the site.
The site brings together information that was already on the Intranet Focus site, but with a significant amount of additional information. The chances of every link being correct are small, but I plan to update the site at the end of each month so if you see a link that is incorrect or a resource that is missing — let me know. What is intentionally missing are links to briefing papers offered by search vendors. It is easier to exclude all than have to explain to certain vendors why I am not highlighting their sales literature.
Searching and Finding
As a long standing member of the Association for Computing Machinery, I had found the search feature on the ACM Digital Library to be frustratingly and indeed embarrassingly poor. But with the makeover of the ACM website, the search application on the Digital Library is much improved, especially in how it highlights query terms in the search results. There is still room for improvement, but it is much easier to find published research.
Membership in the ACM and a fee are required to access the Digital Library. The information problem I cannot solve is the wealth of research published in journals and conference proceedings that are only available on subscription. Of course it is possible to buy individual papers, but this means that search team members have to put the cost on their credit card and claim it as an expense.