Clinton Gormley and Zachary Tong published an excellent new book on Elasticsearch. It weighs in at over 700 pages -- a commitment for even the most dedicated reader -- but worth the effort for those interested in the topic.
In it, the authors describe the information retrieval functionality of Elasticsearch. They describe several hundred functional elements in the book. The skill lies in knowing which to implement given the nature of the content and the type of query that will be used. This requires information science/information retrieval skills, not developer skills. There's a shortage of these skills, but they are essential in four areas of open source search implementation.
Start off by defining the functional specification for the search application. This requires considerable knowledge of how the business creates and uses information. A thorough analysis of user requirements will provide this insight.
Understand the requirements for exploratory search -- where a user is not quite sure of what they are looking for. Providing filters and facets is the lazy way of addressing this requirement. Translate the requirements from business speak into information retrieval speak.
This is a role for an information scientist, not a computer scientist. Aim at reducing the 300 or more options in Elasticsearch to roughly 80 absolutely essential requirements to meet user expectations. And remember, expectations and requirements are not synonyms.
Put in place a careful test plan before the development proceeds. This is much more difficult to frame than with a SQL database. Take great care that the test collections are representative and that you are tracking issues of scalability (moving to the complete repository) and extensibility (additional functionality requirements).
The information scientist acts as the Senior User (in Prince2 terms) to assess the performance of individual and collective functional features.
Escalate the test plan to all repositories and features. The book stops at what might be regarded as technical performance management. It says nothing about the query performance of the implementation in terms of recall and precision.
This is where the initial tuning takes place. To get a sense of what might be involved, take a look at this TechNet paper on tuning SharePoint 2013 search. Note the number of related documents this paper points to. Welcome to the world of information retrieval!
Don’t forget the disaster recovery testing, and remember to run the test in the middle of re-indexing a collection. That’s when disaster recovery becomes really interesting.
Monitoring and Managing
Once the application is up and running, it will require a massive amount of work, done on a regular basis: looking at crawl schedules, checking on stop words, assessing search logs and training and supporting the user community.
Ideally all these tasks require a solid background in information science and information retrieval in particular. I make that differentiation because many information science courses only have a limited amount of information retrieval training and practical experience.
Finding the Right People
Organizations face a shortage of information science skills. The growth of open source search compounds this problem. Too much information science training overlooks the data science focus.
The major difference is the need for computational linguistics expertise in search. If you are unsure where computational linguistics fits into search, then you have an urgent need for an information scientist, or a computational linguist. Both are scarce, and the topics rarely rate as more than an option in computer science courses.
In general, the 59 Information Schools around the world put a focus on information retrieval. Graduates from these schools are very much in demand. As far as I am aware (and please prove me wrong), there are no undergraduate courses on information retrieval anywhere in the world.
There also appears to be no commercial training courses in information retrieval for developers or for managers to learn more about search. A substantial business opportunity can be found here. People can acquire almost every other IT skill through training and a substantial amount of books on the topics.
There are three books on enterprise search. I’ve written all of them.
The Impending Skills Shortage
By now I hope you will have a better understanding of the role of information scientists in the development and support of search applications. A study I carried out for the European Commission a few years ago identified a lack of skills in information retrieval as a major barrier to the growth of innovative search application development in Europe. I have seen nothing to change my mind about this conclusion since 2011.
Ask yourself if you have information science skills on your search team. If you have none, where will you find them?