Session retrospective from GigaOM’s Structure Conference

If you haven’t yet heard anyone say that “data is the new oil,” then you will pretty soon. Companies as varied as PayPal, Human Genome Sciences, IBM and are all drilling down into their data stores, refining and analyzing what they find, and then leveraging their insights to create new products or add value to those that already exist. They are literally turning data into dollars.

And while you may be familiar with some of the ways this is happening, such as Amazon suggesting products according to your buying history, or Facebook serving up ads according to the “likes” of your “friends,” there are other companies that are using the data they’ve collected in other cutting-edge ways.

They are differentiating themselves from competitors by using Machine Learning. It’s a form of Artificial Intelligence that’s been around for a while, but with the advent of Big Data and advanced analytics it’s now going mainstream. A number of CMS vendors (EMC Documentum, OpenText, Microsoft) are incorporating it into their offerings as well.

There was a panel on Machine Learning at the GigaOM Structure Conference that concluded last week; its members made the somewhat complicated subject more accessible. Instead of using Wikipedia’s definition of Machine Learning, which leaves many of us scratching our heads, George Gilbert, a senior analyst at GigaOM, took the burden from his audience by calling it “a bit of a black box to many of us.”

Rather than asking his panelists to define Machine Learning, he invited them to talk about it in terms of their work. When we pair their responses with scenarios from our everyday lives, the subject is easily palatable.

Machine Learning in Practice

A great many of us have, for example, called into a company looking for help with a product or service and found a machine, rather than a human, offering to help us. If it asks us why we’re calling and allows us to speak the answer, we could be headed for trouble, unless the Natural Language Processing (NLP) capabilities of the system are extremely well developed. And that’s not an easy task.

“In a perfect world, users would ask you a question that is both complete and specific,” says Currie Boyle, a Distinguished Engineer from IBM. But of course, that rarely happens, so the machine has to serve up choices.

First it needs to figure out if we’re looking to buy something, looking for support on a product, have a complaint, and so on. Based on how we answer, and what it understands, it has to find the content it needs to help us. This seems like a cinch until you realize that a company might have thousands of, or in IBM’s case over ten thousand, products, and that there may be tens of thousands of words in the documents about them.

Machine Learning not only processes and looks for insights from the information you provide, but it also looks at your interaction history with the company, and at what other people who provided inputs similar to yours found helpful, and even not so helpful. All this data is run through algorithms and then a response is provided.

Machine Learning Gets You Dates

Machine learning has sexier applications as well. Dating site leverages it to pick potential mates for its clients. The pairing process begins shortly after a customer reveals (via a questionnaire) who they are and what they want in a companion.

“We have lot of historical information (gathered over approximately 17 years) about people that enables us to predict who they will like simply based on that information,” says Amarnath Thombre,'s vice-president of strategy and the keeper of the site's matching algorithm.

Thombre and his team of data scientists and mathematicians use complex equations that provide values for hundreds of parameters gleaned from Profile data which include height, religion, profession, family values, religious values and so on, in order to generate the Daily 5 matches for each of the site’s individual customers.

“There’s no one single formula to match people, but we have millions of data points for successful matches,” says Thombre.

Machine Learning kicks-in once behavioral data appears; it is generated when a client decides who of the Daily 5 they want to know more and who they have no interest in. Thombre gets even more data to work with as clients take fun quizzes such as Like at First Sight (a visual quiz) and DateSpark.

“The algorithms get much better at presenting the right matches once that information is supplemented with a feedback loop of ratings,” says Thombre.

It’s interesting to note that what customers say about themselves is often exaggerated. They tend to understate their waistlines and overstate their IQ’s. Not only that, but what they say they want and what they actually choose doesn’t always match; they tend to give on things like hair color, age, whether or not they’ve been married before, insist on having kids, income and so on. There are fewer deal-killers than you’d expect -- people will give on income, but not so much on smoking.

“What people say is a great starting point, but I do believe that what people do is a better predictor of future user behavior once you have a lot of data,” says Thombre. “A system that incorporates both will work the best.”

And as is often said about the land of Big Data, data analytics and data science, it’s part data crunching and part art. At its best, it might help us understand human behavior better than we are able to understand it now. And that, of course, opens the door to endless possibilities.

Title image courtesy of qingqing (Shutterstock).

Editor's Note: You may also be interested in reading: