The decision to pursue machine learning for cognitive search isn't always straightforward PHOTO: Pleuntje

Artificial intelligence (AI) and its little brother machine learning (ML) are receiving tremendous hype — and deservedly so. From smart cars to computer-assisted medical diagnoses, machine learning is an incredibly powerful technology that has only scratched the surface of the impact it will eventually have on the world. 

One obvious use case is for online search, where Google continually uses ML to refine results based on user behavior. For large corporations, cognitive search capabilities can provide employees with valuable insights from massive amounts of structured and unstructured data.

Machine learning can play a key role here as well, but it’s not appropriate in every situation.

Machine Learning for Value-Added Tasks 

A major part of the ‘cognitive’ in cognitive search is the ability to leverage machine learning to supplement or replace certain configuration and curation tasks that have traditionally been done manually. Beyond supplementing and replacing, machine learning provides opportunities to perform value-added tasks that simply cannot be done by humans in any meaningful capacity. 

However, the decision whether to pursue machine learning for cognitive search is not always straightforward. 

Coding Rules and Achieving Scale

While machine learning is not appropriate in every scenario, below are two general areas where ML for cognitive search can make a major impact on organizational efficiency: 

1. Where the Rules Cannot be Coded 

Machine learning for cognitive search can be useful in situations where the rules cannot be coded. Many human tasks such as grouping subsets of documents by similarity cannot be adequately solved using simple, deterministic, rule-based solutions. 

That’s because a great number of factors could influence the outcome. When rules depend on too many factors and many of these rules overlap or need to be fine-tuned, it soon becomes difficult for humans to code them accurately. ML solutions are extremely effective at handling such complex problems.

What’s more, coding rule-based solutions is not always a straightforward process. Hand-crafting a few hundred rules takes work and putting together a training set of tens or hundreds of thousands of pre-classified documents takes even more work. 

A rules-based approach yields massive generalizations from a small set of rules. However, such a generalization does not automatically learn from its mistakes. If feedback is expected to arrive continually — even if it’s at a low rate — automated learning from such feedback to improve classification accuracy becomes very attractive. 

The alternative of manually adjusting the rules from such feedback is more laborious, and injects intelligent-but-not-scalable humans into the mix. 

So a sensible combination would be to use the rule-based approach to get a decent classifier off the ground quickly and then use machine learning to adjust the rules from feedback automatically. 

For instance, machine learning can be used to automatically adjust the strengths of the various rules from feedback to improve the solution over time.

2. When You Need Scale

Machine learning for cognitive search is also appropriate in situations where scale is needed. For example, a person might be able to manually group a few hundred documents by deciding whether they are similar or not. However, this task becomes tedious for millions of documents. ML solutions are effective at handling such large-scale problems.

Machine learning for cognitive search is ultimately another valuable tool to help knowledge workers amplify their expertise. Given its growing appeal, it is increasingly important to know how and when to use ML to maximize its impact without maximizing its costs. 

There are scenarios for which ML can help realize a solution that would otherwise not make sense to pursue manually. 

There are also times when ML does not make sense and would be overkill given the simplicity and/or scale of the problem. 

Finally, there are scenarios for which ML can be used in conjunction with manual effort to achieve a balance between human expertise and the power of modern computing. 

Key Criteria for Machine Learning Success

ML algorithms typically operate in two phases: the learning phase and the model application phase. Success in these two phases comes down to the quality of the data.

Diversity of Training Set

When researching ML solutions, the training set is key. In the learning phase, the data or “training set” is analyzed iteratively to extract a model of what users want from a cognitive search solution. In the model application phase, the extracted model is applied to subsequent input to predict a result. 

A training set consists of an input vector and an answer vector and is used in conjunction with a supervised learning method to train the system. 

If a training set is too small or not diverse enough, it leads to a condition called overfitting in the model application phase. In the context of search, overfitting basically means the “noise” within the target content overtakes the “signal” that would otherwise provide valuable insights to the end user.

For example, let’s say you want to leverage ML to classify a large and/or sophisticated data set and no classification rules exist. You might start by having experts classify a sample set of the data to provide an example from which a machine learning algorithm could create a model. This activity represents the learning phase. When the algorithm applies the model to further classify data, you enter the model application phase.

Quality of Target Data Set

The quality of the target data set is another key ingredient for a successful cognitive search solution. Quality in this strongly correlates to the quality of insights and information presented to the end user. 

The target data set should be sufficiently large, it should contain patterns (even if unclear to a human analyst) and it should cohere to a sufficiently narrow domain so the model can be applied effectively. Even if the target data set meets these criteria, the quality may still be insufficient, especially if it contains structured and unstructured data coming from a variety of different sources — a common scenario in cognitive search and analytics.

In these cases, a sufficiently advanced cognitive search and analytics platform can leverage natural language processing (NLP) and other techniques such as entity extraction or relationship detection to enrich and improve the quality of the data set. Pre-processing enables the machine learning algorithms to work with an enriched version of the data set, which can drastically improve the quality of insights and information ultimately presented to the end user. 

Over time, the optimized output continually serves to enrich and improve the accuracy of the solution for the end users.

Impactful and Cost-Effective Solutions

The decision whether to pursue machine learning for cognitive search is not always straightforward. Understanding the various scenarios in which it makes sense to leverage machine learning — or to avoid it — is critical for arriving at impactful and cost-effective solutions in the realm of cognitive search.