Search has long been dominated by commercial vendors, although in the last dozen or so years open source projects and open source-based tools have come of age, especially in the enterprise search field.
Both open source and commercial enterprise search products now come with enhancements that provide capabilities we only could dream of just a few years ago. The enhancements I want to address here are terms that are often used somewhat interchangeably when they are quite different, and the value to applications like search vary widely.
Let’s take a quick look at the various buzzwords and technologies.
Deep learning starts with raw data, and over time improves on the results it displays through a layered approach, each result getting progressively more complicated in every new layer. Because it will improve, deep learning is often compared with how humans learn.
It may seem like magic today, but over time it will improve.
Artificial intelligence is the broad discipline of creating intelligent machines. This umbrella term covers a number of approaches, but generally refers to machines that emulate — or imitate — human intelligence. HAL 9000 in “2001: A Space Odyssey” demonstrates AI.
Related Article:When it Comes to Intelligent Search, Don't Expect Magic
Machine Learning, a subset of AI, refers to systems that can learn from experience when trained with a large quantity of examples that allows it to continually improve its understanding.
ML can use supervised or unsupervised learning. Supervised learning takes a large number of examples to identify relevance, while unsupervised learning ‘observes’ human behavior and improves over time.
One problem with machine learning: how to "unteach" what ML has learned. Consider an eComm pharmaceutical site that sells products for specific ailment. ML can be primed and can enhance the results over time, but when related or new products come on the market, it may take some time for ML to "unlearn" the older product.
Related Article: What 2019 Holds for Enterprise Search
When it comes to search, a signal is data that search-connected AI tools use to establish everything from determining relevance to identifying similar documents or products. In fact, signals could be used to determine which facets to display, as well as which documents are of interest.
In a corporate environment, the signals we often see are related to the user such as job title, department and seniority. In an ecommerce environment, signals may include queries, purchased items, geographical location and other metrics like user history. In these cases, search attempts to display what “people like you” found helpful.
Related Article: Picking Up Search Signals With Machine Learning
Relevance determines the order in which search results are displayed. Historically, relevance in search was determined by metrics such as “TF-IDF” — Term Frequency/Inverse Document Frequency. Roughly, this is based on the number of times the search term is used in a given document compared with how often the term occurs in all documents. A term that is rare in a given search index, but which is occurs often in a given document, will be assigned a higher relevance than terms that occur frequently. And while relevance is often fixed, signals can enhance the order of results.
How All This Relates to Enterprise Search
As I mentioned, many search technologies today use the well-established TF/IDF relevance model. Increasingly, search products and projects are moving to the open source Apache Spark tool, with its MLlib machine learning library. It is perhaps the best-known ML tool available today and provides outstanding relevance to a majority of the open source and commercial search technologies we use, as well as delivering highly relevant results.
What this means is that search — both enterprise and ecommerce — can deliver high quality and relevant results based on machine learning and the signals that drive ML. Whether you are looking for a commercial platform or an open source platform, more and more search platforms have integrated Apache Spark and deliver more relevant results that meet user expectations — and also help generate happy search users.
The good news is, as I mentioned, most search technologies in use today integrate Spark or proprietary technology to deliver ML. If you’re using search without these powerful ML tools — it’s time to join the 21st century!