Recently Anthony Evans, principal consultant with Computer Design & Integration, was recruited to come in halfway through what should have been a relatively straightforward project. The company wanted to deploy artificial intelligence at its customer service help desk to provide agents with a sort of “whisper agent” that would help the agents with questions about which they were unsure. Either the virtual agent would have the answer or it would escalate the question to a second tier of assistance.

But something was off with the implementation pilot — the whisper agent turned out to be only of marginal help to the desk agents. Eventually the team discovered where they went wrong, according to Evans.

  • The project called for a lot of data that didn’t exist and that, in fact, the team was building over time within the data model. “Given the complexity of some questions it was taking us a considerable amount of time to mature the models to not only respond but respond with the right answer,” Evans said. Over time — several years — this use case would deliver value but not in the time frame the client had been expecting.\
  • The AI was only trained on 25% of the total content that could potentially be answered by the AI. “If the AI is expected to answer or escalate with 100% of utterances that means that the data models need to have 100% of references to intents, even if we don’t support the content in the back end with the answer, it should refer to escalation as an action,” Evans said.
  • There wasn’t the necessary volume of chats. “The implementation was being piloted, meaning we could only get 35 to 50 conversations a day, and given the other problems we were having, our accuracy numbers would swing dramatically on a day by day basis and weren’t showing steady increases over time,” Evans said.

Related Article: Why the Benefits of Artificial Intelligence Outweigh the Risks

Things Go Wrong

Ultimately, Evans chalked up the failed project to a misalignment of expectations versus the reality of the project and its time frame. Despite the halo effect surrounding them, AI projects can and do go wrong with the ultimate example being the tragic death of a pedestrian recently in Tempe, Ariz., from a self-driving car in an Uber test pilot. 

Sometimes the reason for the failure are obvious — perhaps the algorithm was incorrectly programmed or was biased. Sometimes the data being accessed to answer a question is outdated or inaccurate. Sometimes, it is even possible the “problem” didn’t need AI to be the solution in the first place.

Misaligned Expectations

Rarely, is the failure of an AI project laid at the feet of misaligned expectations — but projects of this type often fail for that very reason, said Ted Dunning, chief application architect at MapR. To illustrate his point, he uses the example of music. “If I put music into a genre and tell people that this is the word from on high about what kind of music is what, then I will get a lot of arguments because I implicitly promised 100% accuracy in a situation that doesn't have 100% agreement,” Dunning said. “On the other hand, if I say ‘Here are some songs that are suggested by this genre that you might like’, I will typically not get very much argument. This example is actually kind of trivial, but the principle is very important.”

Or with the example of self-driving cars “if I offer to have the car make a beep if you appear to be weaving in a lane or nudge the steering if you are leaving a lane without signaling, I am making a very weak promise,” Dunning said. “It is on you to drive the car. If the beeper doesn't beep, you should still drive correctly. On the other hand, if I have a product that promises to automatically pilot a car, I am making a much bigger promise and the responsibility for error shifts a bit back to the manufacturer.”

Related Article: The Challenges Facing Today's Artificial Intelligence Strategies

Learning Opportunities

Incomplete Datasets

There are more concrete reasons why an AI project might fail. AI and machine learning systems make decisions based on information provided to them — usually training datasets, said Irfan Essa, director of the Center for Machine Learning at Georgia Tech. 

Essa said that problems can arise when data provided for training is either, not complete, meaning all possible scenarios are not covered in the examples or you may have unbalanced data, where the data covers all of the most likely scenarios but contains very few examples of some of the cases. In each of these cases, the AI/ML system may have not developed a complete picture and needs to be given additional information and training, Essa said.

Data collection is a particular challenge for AI startups, said Mike Brusov, co-founder and CEO at Cindicator. “It is very expensive or hard to buy as there is not so much valuable data for sale: big players like Amazon and Facebook are not ready to share their most treasured resource with anybody. So, you need to learn how to collect it yourself and quickly.”

Algorithms That Are Wrong

Another challenge for AI are the algorithms underpinning the system. Algorithms are only as good as those who develop them, said Jon Stenstrom co-founder and CEO of Quantified Skin. Since a human is developing them, there could be underlying biases that go unaccounted for based on their world views. Because people don't know their underlying biases, other system checks need to be put in place to put a check on the system — such as a feedback mechanism, like audits, Stenstrom said. He points to the example of job applicant algorithms. “Some have shown to bias certain ethnicities or personality traits from even entering the company's job screening process. If this goes unchecked, it opens the company up to not finding the best candidates for the positions but also to discrimination lawsuits.”

AI Needs To Be Able to Fail Gracefully

Additional problems can occur when the sensors capturing data start returning noisy, incomplete information, Essa said. “In these cases, AI has to be trained not only how to deal with failure but how to gracefully deal with failure.” He gives the example of a tool developed to correct for shakiness in a video. The problem is, in some of the mathematical approaches used, in the case of failure the system shuts down and the original video is lost. What you want to have happen is a failure that reverts to the original video, Essa explained — a graceful failure. “You have to train the system holistically to understand the scenarios as well as it possibly can. And wherever you sometimes cut corners by design, or don’t have the right information available, you will not be able to explore the entire set of possible scenarios.

Take the example of the self-driving car that can detect pedestrians, he said. It may do a good job of detecting a human but let’s say that person is carrying a large shopping bag and is no longer completely visible. “If the system hasn’t been trained to detect a moving shopping bag a failure occurs.” And potentially another tragedy.