child's hand on a toy
Machine learning and artificial intelligence hold great potential for improving search, but only if based on a solid data foundation PHOTO: Markus Spiske

With all the hype around artificial intelligence (AI), cognitive search and intelligent intranets, it’s hard to understand how to actually apply those new technologies to improve the workplace. But it may not be that difficult: If you get a few of the basics right, you can put an effective cognitive search system in place and set it up for ongoing growth.

Machine learning (ML) is behind all this, and it is advancing incredibly quickly. Sometimes it’s touted as almost magic, but in fact the underlying algorithms in machine learning have not really changed in a decade. 

Machine Workplace Relies on Good Data

The advances are the result of fast, cheap computing, the availability of huge sets of data, open-source software and models, and easier administration and development environments. Under the hood, machine learning is a statistical process, dependent on availability of data and sensitive to the characteristics and quality of that data. “Garbage in, garbage out” is still true.

Applying machine learning to workplace data without the right approach or the right conditions can result in big problems, including systems that learn the wrong things. Witness the very public disaster that occurred when, Tay, a chatbot developed by Microsoft, learned to make inappropriate and racist comments and had to be suspended and then shut down.

Cognitive Search Starts With the Basics

If you have the basics correctly in place, then you can not only enjoy a more effective cognitive search solution from the outset, but you can also use machine learning to improve it on an ongoing basis. To make it work, focus on three key areas: connectivity, metadata and personalization.


In most work environments, the information that employees need to get their jobs done is spread out across multiple systems. Connecting to the repositories where essential content resides is important for successful search, whether cognitive or not. If you aren’t connected to the right systems, you can’t apply any intelligence to that content at all.

If you start by assessing which systems contain important information and deploy connectors to index content from those systems, then you’ll have a better search solution and a better digital workplace right away. From there, you can apply machine learning to improve performance in many ways.


Information about the content, structure and quality of your content is essential to providing good search relevance. It also feeds faceted navigation, one of the biggest breakthroughs in search. But too often there is poor metadata (or none) because even basic autoclassification isn’t in place.

This is an area that is particularly overhyped today. Natural language processing (NLP), used at some level in every search system, is being touted as a new AI technology. Some vendors imply that their systems learn the structure from the data, with no human intervention. But the only contexts where we’ve seen this work are in academic research (on clean, well-structured repositories) and in demos (on curated data). In reality, you need to spend some effort to achieve good results with any system. To quote research firm Gartner, “Regardless of what vendors may tell you ‘fully automatic metadata’ is not possible.”

Machine learning needs good examples to learn from, which can be difficult and time-consuming to provide. A pragmatic approach is to start out with some straightforward taxonomies and rules, and then use ML to learn from those and then improve performance or expand into other domains.


Tailoring results to different roles or locations is a key technique for effective enterprise search. Recommendations and suggestions are often the first things to personalize, but the same applies to relevance and to many other parts of the digital workplace user experience. Users are identified when they sign in, so many aspects of their profiles and behavior can be inputs for machine learning.

In most organizations, user profiles are fragmented and incomplete, so it is very tempting to try to have machines “learn” about users and construct their profiles “automagically.” And years of research in recommender systems have in fact developed some effective ML approaches for inferring user profiles and preferences. But these all depend on data quality and on some amount of explicit information about users. Without a good foundation, this kind of ML can go off the rails quickly and turn your digital workspace into a land of spam.

You have some solid information about your users, even if it is spread across HR systems, learning management systems, practice management systems and social collaboration platforms. If you connect to these systems, aggregate information across them, and use basic NLP to normalize them, you’ll find that the resulting “virtual user profile” gives you a solid basis from which to grow.

Learn From a Solid Foundation

Once you have the basics of connectivity, metadata and personalization in place and maintained on an ongoing basis, it will make your search better and your digital workplace more intelligent no matter where you grow from there. You can extend these areas to new sources and new domains as you go. You can apply ML to improve findability and decrease the cost and effort involved in maintaining and using search. You can also provide new types of search and discovery, such as multimedia and multilingual search, spoken language search or question-answering chatbots.

There’s no free lunch: Cognitive search still needs the fundamentals in place to work well. But if you attend to these basics, you will have a better solution to start with, and you also will have a good basis with which to train ML-based software.