Optimism by Mike Wilson
There are good reasons to feel good about AI, machine learning, Natural Language Processing (NLP) and search. PHOTO: Mike Wilson

(Second article in a two-part series)

AI, machine learning, Natural Language Processing (NLP) and search all date to the dawn of computing.  

Huge amounts of research and multiple waves of hype have happened in all of them over the past 40 years.  But there are good reasons that this wave of technology has particular promise now. 

Big Training Sets, Cheap Computing: The large amount of data available combined with the low cost of extremely powerful cloud computing infrastructure make a great environment for AI and machine learning.  Look to see what areas of your business are mainstream enough to leverage other people’s training sets.

Advances in Natural Language Processing (NLP): Though not quite as hyped as AI, NLP has advanced from basic applications such as speech recognition, speech synthesis and machine translation to a range of real-world applications such as creating spoken dialogue systems and speech-to-speech translation engines, mining social media for information about health or finance and identifying sentiment and emotion toward products and services. You don’t need to keep up on all this, but I do recommend reading at least an primer on NLP, such as Matt Kiser’s “Introduction to NLP 2016” blog.

Mainstream Vendors and APIs: Google, Microsoft and IBM are all very actively working in this area, advancing the field, driving the cost down, and providing frameworks that accelerate development and training time.  Along with this, the practice of incorporating cloud services via API calls makes it much easier to construct powerful systems.

Rise of Knowledge Graphs and Linked Data: There are now some amazing high-quality sources of knowledge that are easy to leverage, as a result of crowdsourcing efforts (such as Wikipedia and DBPedia), commercial efforts (such as Wolfram Alpha and the Google Knowledge Graph), and the Linked Data initiative.

It’s a good bet that in the medium term (say three to five years) there will be some obvious applications in nearly all organizations. Long term, I believe Accenture’s research predicting that AI could double economic growth rates in 2035 by changing the nature of work and creating a new relationship between man and machine.

A lot of this is about timing and finding the right applications at the right time. 

Pick the Right Problem

The best way to take advantage of the capabilities of this new breed of systems is to pick your problem carefully.   

Some of this is common sense, and some requires a bit more understanding of the nature of the technology.  In a short article, I won’t even try to describe all the aspects of this, but I can give you a flavor of it.

In the common sense arena,  look for problems with:

  • The right level of business importance — there will be an investment of time and money investment involved, so you want to pick a problem that matters to your business.  But don’t bet your whole business on a new technology (at least until it’s proven in some area first.)
  • A reasonably well-defined scope — the better defined the domain is, the easier it will be to get good results.  It will also be easier to tell if it’s working or not, and easier to understand the business value.   But in many situations you may not fully understand the problem until you get into it, so don’t try to tie it down too tightly.
  • Some tolerance for errors — since this technology will never be perfect, you want people in the loop somewhere.  If you build a nuclear missile system based on intelligent self-learning search, be sure the machine isn’t pushing the launch button automatically.

With respect to fitting the nature of the technology, let’s pick an example.   We’ll get a tiny bit deeper into machine learning and see how it applies to question answering systems.

At a high level, there are five basic questions that can be answered by using machine learning and there are five basic families of algorithms that help with these questions:

  1. Is this A or B — Classification Algorithm (2 class or multiclass)
  2. Is this weird —Anomaly Detection Algorithm
  3. How much or How many — Regression Algorithms
  4. How is this organized — Clustering Algorithms
  5. What should I do next — Reinforcement Learning Algorithms

When a system uses machine learning, you can usually get a sense for which of these is being used.   Although all of these algorithms are complicated and heuristic, they are not too hard to understand at a high level.  (Daniel Tunkelang has a nice blog explaining binary classification to a child.)

Question answering systems (QASs) that generate answers of questions asked in natural language have been a research subject since the 1960s, and they have gone through several cycles of hype. Lately, there’s been some remarkable progress in these systems, with several commercial vendors and some nice real-world success stories. Inside a QAS there are many elements, such as document analysis, retrieval and ranking, question and representation…and a question analysis and classification piece that typically uses classification algorithms. 

Classification algorithms get less accurate with more classes (choices) and work better if the classes are well separated (different from each other).  They also work better with more training data.   So if you have a customer support page or bot you want to improve, it can be a great fit if you have:

  • Well-structured/curated answers
  • Hundreds of different answers (rather than hundreds of thousands)
  • Lots of examples of customers asking these questions and judging whether the answers are good (thousands of examples, rather than just a handful)

This gives you a taste of choosing the right problem.  You’ll find that technologists love helping you determine what are good problems to address in your environment.   (They should also tell you that there are some problems you shouldn’t choose, and explain why.)

A New Breed of Intelligent Search Systems

However overhyped, it’s clear that there have been real advances in a variety of technologies that are emerging as “intelligent search” in one form or another.  

You may not have the right problem for these today, but you can expect that eventually there will be an important application where this kind of system can bring you big benefits.  

There are a few things you can do today to prepare for the new wave, no matter what system you adopt or when:

Understand your current search system(s) and how they are working today — if you don’t have an understanding of what content is crawled, who administers search, and some analytics or metrics around how search is working, get those now. 

Attend to the fundamentals — content acquisition and metadata are always important, so attending to this now will help today’s system and tomorrow’s.  

Start listing problems to solve — nobody needs more problems, but once you have the habit of looking around you will discover some that can be solved more easily than others.  You’ll have fewer surprises, too; these are problems that already exist and keeping track will keep you aware.

Build expertise in findability — there are remarkably few people that are experienced in search and findability. You can develop this now — give someone the job of maintaining search.  They don’t need to be a deep techie, but they do need to understand your content, have a business perspective, and be interested in human language.   People who are used to the mindset of messy human language, subjective judgements about relevance, and constantly changing content are much better at picking up learning and ‘cognitive’ systems — and they will help you with today’s problems and systems as well.

Try out some of these tips. As you get practice separating hype from reality, you’ll find some genuine gems — some useful tools and products — that can improve your business.   

Read the part of this series, Intelligent, Cognitive, AI-Based Search: Separating Hype From Reality.