Organizations are on an ongoing journey to improve findability and increase search performance. The task is difficult, but artificial intelligence approaches, including advanced machine learning and text analytics, are helping these businesses improve performance and increase user satisfaction.
Why is getting search right so hard? The primary obstacles are:
- Poor content architecture: Inconsistent taxonomies, metadata that does not reflect the needs of a user or support a process.
- Insufficient tagging: Even when content has the right structure, it is frequently missing tags or tagged incorrectly.
- Volume of information: Overwhelming amount of unstructured information produced across departments throughout the enterprise.
- Ambiguous nature of language: The same terms are used to describe different things and different terms are used to describe the same things.
- Inability to understand the searcher’s context: Their background, level of expertise, the nature of their problem or objective.
- Disconnected systems: Information needed by a function or department is frequently locked up in multiple systems, sometimes with limited access.
- Large monolithic documents: Many times, a needed piece of information is in a large document where the answer can be found only by scanning or reading the entire document.
- Insufficient or non-existent resources devoted to improving search: Too many organizations still fail to devote the necessary people and time to the task.
While many of the most intractable problems in developing an effective search function can be traced back to past sins in knowledge, content, and data management and curation, organizations have not had the appetite to make significant investments to fix these issues. It was just too costly in terms of time and money. Developing a better way of managing and valuing content required changes in enterprise processes as well as corporate culture.
In many cases, poor content leads directly to higher costs or lower revenue. The most tangible and measurable areas are in support operations and ecommerce capabilities. Organizations learned many years ago that search engine optimization (SEO), the mechanism for optimizing content for Google, Bing or other internet search engines, was a critical element of ecommerce and therefore warranted significant direct investment. In fact, the SEO services market is estimated to be between $50 billion and $80 billion, whereas in comparison the enterprise search market was estimated to be $3.8 billion in 2020.
While these statistics don't address the details of how these estimates were developed and what is included, the difference in spend is striking. Enterprise search lacks the same resourcing and perceived business value that SEO has established in the eyes of business leaders.
The Gap Between SEO and Enterprise Search Investment
What is the SEO money actually spent on? Well, SEO is about making content more visible to search engines. Organizations can’t mess with Google’s search engine configuration — they can’t control it. They only can control their content (analogous to not being able to control other's actions, but only our responses to their actions). Google uses more than 200 signals to rank content, some of which are not known outside of Google, but producing quality content on the topics of likely searches is a significant factor.
Internet search uses limited metadata (title, description, image alt tags, plus terms that are formatted as headings and subheadings within the content) and some structured data to describe people, events, places, articles, products and more. These objects can be described using metadata schemas from schema.org to enable search engines to create rich snippets as part of search results. When a robust schema is used and content is properly tagged, Google can surface many details about the content — from organizations, events, people, products, offers, actions and much more.
Why is there such a disparity in resource allocation between SEO and enterprise search? The explanations cover a broad range of issues:
- ROI for enterprise search is difficult to measure, while SEO can be directly linked to revenue. Payback in the latter is clear and obvious: more visibility equals more traffic. More traffic correlates to increased click throughs, conversions and ultimately revenue.
- Enterprise content is created and managed by many different departments within an organization, while SEO is more centralized and managed by marketing or ecommerce teams.
- Enterprise search deals with more diverse audiences, and with content that is more detailed and nuanced than typical ecommerce content, which is related to a well-defined set of products or services.
The typical refrain from users of enterprise search is “Why can’t it be like Google?” and my response is “if organizations spent the same resources that are spent on optimizing content for Google, enterprise search would be more like Google." Google technology leverages the best machine learning algorithms in the industry, has deep expertise in tuning those algorithms, and invests more in research and development than even the largest of enterprises.
All of that said, enterprise search is benefiting from advances in machine learning and AI, some of which is a direct output of research and development from the large technology vendors.
Related Article: Making a Business Case for Enterprise Search
How Do AI and ML Impact Search?
Search was one of the early applications of machine learning. Clustering and ranking of content and relevance are based on algorithms that create a mathematical model of content to be indexed along a “vector space” and then measures how far that model is from other vectors (representing other content items). It then iteratively clusters the content based on its proximity to other vector representations of content. The "learning” takes place during each iteration of clustering of vectors, which means clustering (i.e., categorization) of content. The algorithm begins with an estimate (you could call it a guess) at how close the vectors are, and uses that guess to minimize the space between vectors on each iteration. In this iterative way, documents that contain similar concepts based on word occurrences can be tagged and classified.
AI is a broad and ambiguous term that can be interpreted in many different ways and applied as a label to many types of applications. Any search vendor can legitimately say they use AI. At its core, AI and machine learning are about classification. Machine learning can power text analytics, which allows for identification of entities contained in documents (names of people, places and other objects within documents). It also has the ability to classify content across multiple dimensions. This includes extracting content type, topics, intellectual property, confidential information, personally identifiable information (PII), and other attributes. Extracting entities allows for filtering based on attributes.
Related Article: What It Takes to Deliver Successful AI-Driven Search
Semantic Search and Intent Identification
One approach for dealing with inconsistencies in terminology, metadata standards and naming conventions is through use of semantic search. Semantic search refers to the ability to retrieve content based on concepts contained in the content, even if a particular search term cannot be found in that content. Semantic search can be enabled through synonym rings and thesaurus structures that map other terms that represent the concept being searched for.
For example, imagine a person is looking for “proposals,” but the term is not used within a document. Instead, the term Statement of Work or SOW is used. Mapping those terms to a preferred term will tell the search engine that when someone searches for a specific term, the query should automatically expand to include those related concepts.
Machine learning can also achieve this goal using intent classification. Intent classification is a common approach used to account for variations in how people ask questions about a particular topic. To request help in resetting a password, for example, a person might say “I’m locked out of the system,” “I forgot my username,” “I can’t recall my credentials,” and so on. Using utterance variations and machine learning will allow these variations to map to the same answer: “Reset password.” Semantic search can operate in the same way, mapping the different ways that users search for information to a consistent response.
Related Article: Enterprise Search and Machine Learning: A Match Whose Time Has Come
Putting Search Queries Into Context
Finding answers to specific questions is one of the ultimate purposes of the increasingly sophisticated search engines that are being developed with the help of machine learning. Question answering systems require refactoring and componentizing knowledge and content so users don’t have to wade through hundreds of search results only to come to a 300-page document they need to dive into to find what they need.
The key is contextualization of information and a deeper understanding of the user and their perspective. Important factors include the user’s level of background knowledge, experience, the detail the user needs in an answer, and the mental model that determines how they go about solving their information-related problems. Search is a recommendation engine, and for optimal performance, it must conduct complex evaluation and analysis of multiple factors and user signals.
AI is increasingly baked into many technology platforms. AI-powered search can improve relevance by factoring in more signals about user behaviors and background from the many systems they interact with. Some dimensions of those signals are explicit, such as role or department. Others are more subtle, such as how they engage with content (downloading, spending more time, forwarding, or subsequent wayfinding actions such as filtering, navigation or executing another search). Machine learning can prioritize content that correlates with engagement and wayfinding signals and can apply the same prioritization for other users with similar attributes.
Improved content will allow search to perform more effectively. But in the absence of quality content with the correct tagging and metadata, search algorithms apply metadata to an architecture that is derived through data joins across sources. AI can help throw away the redundant, outdated and trivial content (ROT) using a combination of rules and training data. Text analytics essentially improves content quality. Indexing content adds value by increasing its findability so users can access what they need more quickly and easily. Other algorithms provide auto-summarization and main concept identification, AI-powered search will make query recommendations based on searchers with similar profiles and learn from each searcher’s behavior.
Related Article: Has Microsoft 365 Been Clinically Tested?
Supplementing Metadata and Improving Indexes
Much of the value in today’s AI-powered platforms lies in the assemblage and integration of multiple types of machine learning algorithms to handle various aspects of language and text processing. Search creates an index of word and phrase occurrences and multi-faceted concept relationships in documents. Words and phrases represent concepts that machine learning can expand or narrow or relate to other concepts. Concept relationships can be derived using machine learning, which recognizes those concepts as metadata. That metadata can be defined by extracting entities from the text to enrich the index using multiple facets. Facets become conceptual handles on the information, and allow for personalization based on job role, location, interests, department and prior searches.
Another aspect of modern search architecture is that connectors are based on well-understood integration mechanisms via APIs and web services. Metadata schemas are derived and cross-mapped using machine learning, allowing for richer indexes based on different data sources. Imagine looking up customer data according to region and cross referencing to specific product categories and financial information. These algorithms create graph data relationships across structured and unstructured information.
This all means we spend less time and money on the plumbing. A colleague in the search space remarked that “we could do a lot of this 5 and 10 years ago. Ten years ago it would have cost $20 million. Five years ago $5 million. Today it’s more like $500,000.”
A Glimpse of the Future of AI-Powered Search
AI is not a magic bullet. Businesses still need to put effort into creating reference architectures: lexicons, taxonomies, graph structures, metadata standards, and thesaurus structures, especially in areas with unique terminology or deeply technical content. Search is retrieval, and retrieval powers high functionality virtual assistants and chat bots. Text processing can break content into chunks. Tools such as Amazon Kendra create question answering systems from content to power bots. While not perfect, bots can be a great starting point to save a human's energy.
One day we will all be dealing with virtual assistants. Behind those assistants will be a series of text processors that will precisely understand the query and context, and provide access to the Information or other specialized bots to address the problem or provide the answer. The future is AI-powered search and retrieval.