In the early days of search, users were more than happy to get results that had some semblance of relevance to the (likely) single term query. And in those days, any result that made sense was a win.

Thanks in part to Google, users have come to expect much more from search. I can’t tell you how many times I've heard people say they want search "just like Google." Even Google, by retiring the popular Google Search Appliance, is admitting that solving the enterprise search problem is a difficult task.

But expectations have changed once again. Users now want "psychic search" — search that, like Google, just seems to know what they mean. Yet search by itself is not a great answer: many people don’t realize that over the last dozen years, Google and other public web search and ecommerce sites have morphed from "web search sites" into “machine-learning matching systems." Let me explain.

You Can Train Old Search New Tricks

When you spend time on virtually any of the large web properties — Google, Yahoo, MSN, Amazon and such — the site platform focuses on what you search for and what you do on the site and adds that information to what it knows about you from your previous visits. If you browse products on a site like Amazon, you'll be seeing advertisements for those and related products for weeks to come (which can be especially frustrating when your search was for a gift for a friend).

Your searches provide clues about your interests. In search, those clues are known as "signals." When processed by machine learning (ML) or artificial intelligence (AI) tools, and integrated with search, they are used to automatically influence the relevance and results you’ll see. This is a far cry from early enterprise search like Verity, Fulcrum and others.

The problem is even ML and advanced technologies are not "fire and forget" tools: they still need to be set up properly. They provide the ability to "train" an instance with search, which can then be used and extended over time to essentially incorporate user behavior, which then becomes even more "signals." ML/AI basically generate a profile for both users and for content. This enables search to display results that "people like you" have found helpful. Note that the profile uses many signals, but generally the signals are behavior-based, and do not incorporate personally identified information. For example, your work profile — job title, documents you’ve authored or viewed, sites and content you view — provides the signals that, in addition to the query you enter, are used at search time to produce results. Think "employees like you."

Related Article: The Trouble With Insight Engines

The Enterprise Advantage

An enterprise doesn't have the breadth of content or number of users the large web search and shopping sites have. But we have something they don't: detailed information about each enterprise user, including job title, department, location, co-workers, and potentially all of the documents, emails, and other correspondence authored. And remember, most enterprises have a much smaller set of documents than a large internet site. Chances are the range of content is also more focused.

Learning Opportunities

Both of these provide us with relevant profiles we can trust. So although we have fewer signals, the ones we do have are more likely to be accurate. This means that the tools and data available to us are more precise, and the tools that work so well on large internet sites will likely deliver great results even on our smaller sets of content. But training (and tuning) is still important, and any solution that doesn't provide a way to do so should not be taken seriously.

Related Article: For Enterprise Search That Really Delivers, Use Context

There Is No Magic, Just Hard Work

We are currently in the early, enthusiastic stages for machine learning and AI in the enterprise, a.k.a. the "Wave of Insight" in which some analysts and vendors have renamed enterprise search as "Insight Engines." This new name seems to involve shipping pre-existing enterprise search platforms with open source tools like Spark, Mahout, or simply MLlib tie-in and support. As I've written here and elsewhere, that doesn't seem to justify inventing an entire new category.

Many of us who work with search platforms have long lamented how few enterprise search customers make the required effort to properly set up search with appropriate synonyms, stop words and "query cooking" to enable search to work well. Given the recent trend for vendors to bundle their search product with ML or AI tools like Apache's Mahout or the popular Spark MLlib, I fear customers will see this as an invitation to do even less to prepare for search. Both tools provide the capability to train and utilize machine learning, which can enhance well-tuned search — but only if someone takes the time to train and tune.

Machine learning isn’t destined to fail, nor is it a panacea for fixing enterprise search. Enter any new search project with the objective to "improve search" — not to "make search like Google." Plan on staff and time to train ML appropriately, verify it's well integrated with your search platform, and remember that even AI-drive search is not magic: it takes time to set up and manage. But without that training and management, you're going to be unhappy with ML-driven search.

fa-solid fa-hand-paper Learn how you can join our contributor community.