The Curious Case of the Voicebot

In his now classic book among voice technology practitioners, "It's Better to Be a Good Machine Than a Bad Person," Bruce B. Balentine reminds us that humans readily accept new technologies, imperfect as they may be, as long as those technologies deliver some incremental value. As long as the person is better off using the technology, it doesn't matter how small that value is.

Why Do We Hold Voicebots to Higher Standards?

Think of the first generations of automobiles, radios, toasters, refrigerators, washers and dryers, TVs, VCRs, PCs, the internet, web browsers, cell phones, smartphones. From our current vantage point, they're laughably imperfect compared to what they eventually evolved into. Why is it we not only gladly embraced such barely functional, barely usable technologies, but for decades put up with their glaring, often frustrating imperfections? We continued to use them as they slowly (sometimes very slowly) evolved and matured. And yet, no such patient indulgence has been extended to voicebots — to machines that attempt to converse with us by speaking and listening. Why is this?

Balentine’s answer: the technologists who build voicebots have promised something outside of their capacity to deliver — now or potentially ever. That is, a machine capable of conversing with a human the way two humans can converse. As a result, instead of looking back to an age where nothing existed and now something exists (say the telephone is born), or to the previous version where a feature didn’t exist and now a new one has been added (from rotary dialing to push buttons), the user doesn't get the feeling things are progressing. Instead, they are constantly comparing what they have in front of them — a voicebot — with something far more superior — a conversationally competent human being. Instead of saying, "Oh, I am able to do something new with this voicebot that I was unable to previously do," they say: "This voicebot is still far, far from the voicebot that I would like it to be."

Related Article: A Good Chatbot Is Hard to Find

Our Love, and Mostly Hate, Relationship With IVRs

Hence the now decades-long special loathing we have developed for interactive voice responses (IVRs). When we call into the bank, our power company, the DMV, or the toll-free number of any large company from whom we buy our products and services, the IVR is unavoidable. To be sure, we never call such companies intending to converse with a voicebot. We almost always call them wanting to speak with a human to solve a problem.

But now we have voicebots in our homes (via smart speakers such as the Amazon Echo and Google Home) that we intentionally and willingly engage with. Which invalidates the often proffered explanation that we don’t enjoy interacting with voicebots because they are a barrier between us and a real human (how often do you press zero or say, ‘Agent, agent, agent’ when you call into one of those IVRs? — me: almost always). We continue to be dissatisfied with these machines mainly because we keep comparing them to a fully competent human being. So no matter how much they may have improved compared with last year’s voicebot — the progress is not noted or appreciated.

Welcome to the Uncanny Valley

In fact, matters are even worse. Professor Roger K. Moore at the University of Sheffield posits that the more human a voicebot starts out sounding in an interaction, the bigger the disappointment will be at some point in the conversation. A voicebot that sounds almost indistinguishable from a human will further prime the human's baseline expectations. They will therefore expect the voicebot, which sounds like a human, to be as human-like in other dimensions of their interactions. For instance, expecting the voicebot observes the rules of conversation, turn taking, managing miscommunications and politeness. And since today’s voicebot is very far from being anywhere as conversationally competent as a human being, the consequence is the interaction starts on a high note and then, when something goes sideways — for instance the user says something that the voicebot doesn’t understand and the voicebot responds in a decidedly non-human way — the robot’s mask suddenly falls, resulting in a feeling of eeriness and revulsion, or, what in the industry is called, “the uncanny valley.”

So, what is to be done? Voicebots do deliver appreciated value and people want to engage with voicebots. Exhibit A: the rapid adoption of the Amazon Echo and Google Home since their arrival to the mainstream in 2016. And yet, the disappointment remains. Is there anything that can be done?

In Part Two, I outline a solution to this problem by proposing a new philosophy of designing voicebots.

fa-solid fa-hand-paper Learn how you can join our contributor community.

Why Voicebots Continue to Disappoint Us

Why Do We Hold Voicebots to Higher Standards?

Our Love, and Mostly Hate, Relationship With IVRs

Welcome to the Uncanny Valley

About the Author