radar picking up signals
Google is no longer a search engine — if it ever really was. It's a machine learning application disguised as a search platform PHOTO: Stellan Johansson

Signals are signs. They are indications. 

In the context of search and machine learning, signals indicate an action has occurred. Sometimes, the signal is opening a page. Sometimes it is clicking on a result list. Other times it’s adding a product to a shopping cart. 

As commercial and open source search technologies begin to integrate machine learning tools like Hadoop’s Mahout and Spark’s MLlib, signals are often one of the data points of interest in determining the results to display, as well as the order. Signals play a key part in relevance ranking in enterprise search.

A Search Engine in Disguise

Google, with its search capabilities everyone wishes their search platform delivered, isn’t really a search engine any more — if it ever really was. Google is a machine learning application disguised as a search platform. 

Google now tracks user behavior to determine what results people find most useful. The same is true with Amazon. When Amazon and Google show you results based on “People like you ...” they mean other users who have searched for similar terms, looked at similar content, and have generally looked at the same types of content (or product) that you might have interest in. 

Amazon and Google have gathered the same type of activities — signals — in your web and shopping behavior that resulted in a page or product view.

Years ago, a colleague of mine had one of these "ah-ha” experiences on the web. He’s a big, bearded guy who is a committer to the FreeBSD project. He’s responsible for operating company web servers that need to be up 24/7, safe from attacks and intrusion attempts, which generate a big part of his company’s revenue.

He was searching Google for "wi-fi routers" when a product suggestion popped up: khaki slacks for portly gentlemen. His first reaction was to curse Google for an arbitrary product suggestion. But seconds later, he told me he looked and decided that the khakis were pretty nice. 

Bringing Machine Learning to Enterprise Search

This is new information to many people, but it’s certainly nothing new to machine learning specialists. And now that technology has improved to the point where "ordinary" enterprise search instances are delivered with similar technology.

Machine learning is now an integral part of two of the more popular commercial platforms. 

Lucidworks Fusion now includes Apache Spark (and MLlib) with its product download. And Elasticsearch recently acquired and has integrated Prelate, which positioned itself as a ‘behavioral analysis’ tool. 

Both boil down to technologies that capture and evaluate signal events as site visitors browse. Over time, the software begins to notice trends, and at some confidence level, the software can suggest related content — whether a solution to a tech support issue, a company policy or form, or a product. And the results and recommendations improve the more people use the site.

Taking Machine Learning at Face Value 

With traditional relevance ranking, you could often reverse engineer results to understand why a given product, stock or document was displayed. Even Verity, introduced in the late 1980s, included an 'explain' function that described why one result was ranked over another.

Many people see a potential flaw with machine learning in that, while the algorithms may work, there is no way for the technology to explain the validity of the recommendations. Machine learning applied to financial markets can’t explain why a given stock is a "buy" or "sell," so we have to trust the ML algorithms are valid and working. 

The vulnerability is that a malicious party could potentially hack an application to deliver specific results and benefit from knowing what the hacked algorithms will report. “Trust but verify” is no longer an option.

Brush Up on Machine Learning

If you work in and around search technology, it’s time to start learning about machine learning.  

Fortunately, it's still the beginning of the wave. A number of good products are free and tutorials are easy to find.

As a search guy, my interest in machine learning may be different than yours. I’ve been exploring machine learning by using the two search products mentioned above: Lucidworks Fusion with Spark; and Elasticsearch, with its Prelate Behavioral Analytics integration.  

Other tools to consider experimenting with include: MapReduce / Mahout, Spark /MLlib, Coudera Oryx or AWS / Amazon Machine Learning. Lynda, Coursera, Udacity and others offer training as well.