This is a true story. I was sitting in a pizza parlor waiting for my lunch when a man sits down on the other side of the restaurant. He pulls out his iPhone and starts talking into it.
He was using Siri to do things like set calendar entries, make notes for himself, and even dial a phone call and leave a phone message for someone.
Siri, Find a Quiet Restaurant
How did I know he was using Siri for these functions? Because I could hear him. To overcome the ambient noise in the restaurant, he was yelling into his phone loud enough that I could hear him across the room. Loud enough to be truly irritating.
Not only was it intrusive and annoying to hear someone screaming at his phone, he was also revealing details of his business to everyone in the restaurant. It was the perfect combination of poor social behavior and bad business practice.
These are a just two of the many reasons why the effort being put into voice interfaces seems misguided. There have been smartphone voice commands for at least 10 years to enable hands-off call handling. A car, though, is a mostly private space. The latest wave of voice interfaces is finding its way into public spaces.
Do We Really Need Voice Interfaces?
After Apple introduced Siri, it seemed as if all the major consumer-facing technology companies began to introduce voice interfaces. Google Now, Microsoft Cortana, and Amazon Echo and Dot have all been brought to market relatively recently.
In all of these cases, Siri included, the use cases seem weak. Outside of support for people with disabilities and a limited number of situations that demand hands off interfaces (such as driving), voice interfaces don’t add much to typing or pointing and clicking.
The big question though is “Do consumers really want to talk (or yell) into their computer devices?” I can think of a lot of reasons why they might not. For example:
- A person can’t use voice interfaces in public spaces without annoying people around them. The person in the pizza restaurant either wasn’t thinking about other people or was just a jerk. Either way, yelling into a phone in a public space is not socially acceptable. The social aspects of voice interfaces are still evolving but it’s obvious that no one wants public spaces inundated with hordes of people speaking commands to their devices instead of typing.
- Similarly, voice interfaces don’t work well in office environments. In an open office, it won’t work to have everyone mumbling into their computers or, more accurately, yelling at them. Even with cubicles and offices, people speaking into their computing devices all day will be disruptive.
- Home environments have this problem too. Ask anyone if they want to hear their spouse, children or roommates talking to their computers. Most would rather their family members walk over to their computer and click on “play” to play music than yell at an Amazon Echo from across the room.
- Even moderate ambient noise confuses voice interfaces. It’s kind of hilarious that Spotify will set off Cortana in my home office. Cortana keeps trying to understand what the “speaker” wants until it gives up in frustration.
- It’s not really artificial intelligence. Anything outside the menu of supported commands still just generates a web search or a confused voice system. Technology companies may be overcoming the voice recognition aspect of voice interfaces but still can’t make a computer react to unknown circumstances.
Voice interfaces are essential to making computers more human and eventually enable anthropomorphic robots. Computers today, however, are not human. And until they walk and talk like humans, voice interfaces will seem more like a gimmick than a useful feature.