Celebrity data scientist Nate Silver, take a seat. You too Google, machine learning gods.
Microsoft’s Cortana, the Siri equivalent on the Windows phone, has called every FIFA World Cup elimination round match correctly. That’s right, she’s 14 for 14.
On Wednesday she correctly predicted that Argentina would beat the Netherlands. In Tuesday’s game she said that Germany would beat Brazil. And as you keep going back through each game in the elimination round, you’ll see that she was right over and over again.
Beating the Competitors
Google missed the mark when Germany beat France. Here’s the excuse:
Perhaps in our excitement about using Cloud Dataflow, BigQuery and Compute Engine to arrive at our predictions, we may have been better served by heeding a more simple truth: Gary Lineker once said, “Football is a simple game: 22 men chase a ball for 90 minutes, and at the end, the Germans always win.”
But forget the brainiacs of yesterday, Cortana is the odds maker now.
“Well it’s not really Cortana that makes the predictions,” says Ted Roduner, a community manager who works in Microsoft’s Bing division. “It’s Bing. Bing is the soul of Cortana. All of Cortana’s intelligence is built on Bing.”
Roduner goes on to explain that there’s a huge index behind the search engine that goes well beyond websites and links. It indexes people, places and things, images that range from the President of Japan to the Empire State Building to Mt. Everest. And then, of course, there’s all the data that things produce. There’s plenty of information to make Bing look smart.
Computing the Odds
So when Microsoft’s Research Labs started to fool with predictions, they looked carefully at what kind of questions they could ask and what kind of data they would need to get answers right.
They started with American Idol, which is for all intents and purposes a popularity contest. They found that they can predict who will get booted off by looking at iTunes downloads, tweets and such.
Next came the World Cup which is an entirely different kind of game.
“The numbers of people who are fans of Brazil don’t necessarily improve its chances of winning,”
says Walter Sun, Development Manager for the Core Ranking team at Bing, in a blog post.
“Instead, you want to model the competitive strength of teams and then leverage expert opinions, with prediction markets as a proxy for that,” he adds.
So Bing’s machine learning experts are pulling in data from win/loss/tie records in qualification matches and other international competitions and margin of victory in those contests, adjusted for location since home field advantage is a known bias. Further adjustments are being made related to other factors which give one team advantages over another, such as home field (for Brazil) or proximity (South American teams), playing surface (hybrid grass), game-time weather conditions, and so on.
Even More Data
Other variables are/were also taken into account, such as Neymar being injured and, believe it or not, where the bookies are placing their bets. It seems that Microsoft Research believes in the ‘wisdom of the crowds’ phenomenon captured by people wagering on outcomes.
““I have created a full model,” says Rothschild, in a Microsoft Research blog, “But I rely heavily on the prediction-market data. The reason is simple: The problem with pure fundamental models is that even the best fundamental models are lacking because the World Cup is an event held just once every four years, without any regular season. There is a lot of idiosyncrasy in the event that is hard to capture in historical data sets.”
“Both the fundamental data and the prediction-market data update as the World Cup progresses. The predictions will update every few minutes, and I will also show the pregame predictions for all games,” he writes.
So who’s going to win the final, Mr. Rothschild? If you fool with his model, at this point, he predicts Germany. So, by the way, does Cortana.