Let’s talk about voice


Google announced in May that their ability to recognize human speech was now 95% accurate, which is the same level as humans! Just four years ago, they had an accuracy of around 78%, so expect technology to be better than we are now very soon. This level of skill means that its fair to say that the problem of recognizing the spoken word is pretty much solved.

This milestone came about at a time when there have been significant advances in Artificial Intelligence (AI); specifically in the are of Machine Learning (ML). Together, this has caused the perfect storm of inspiration to build experiences around human conversation. The aim of this is to create a new mode of interaction with technology that goes beyond what you find in cars and on phone support lines.

Broadly, you can silo the typical types of interaction into Transactional and Conversational interfaces.

Transactional interfaces have been around for many years. I created one to allow you to get the weather when I worked at the Weather Center back in 1998, which involved a phone line and a fax machine. They aim to get you efficiently and effectively to your goal, but unfortunately, they all too often suffer from a lack of script design that gets you into an awful loop or dead-end scenario.

Conversational Interfaces are different in one critical area, and that is in the fact that they have some form of AI involved.

Today, you will come across both types in the guise of a messaging bot, such as a slackbot (transactional) or the EMS.ai (more conversational) chatbot that I helped to create. You need both of these to interact with using your keyboard. However, more interesting to me personally is the voice assistant category that includes the Amazon echo, Google home, or Siri.

It is in this category that the real tech war is happening, with all the players vying to be the dominant player in your home and life. Initially, Apple lead with Siri, but a lack of inspiration to move it forward has meant the lead spot is cleary with Amazon and its echo product range. By the end of 2016, Echo has over 10 million devices live in the USA (117% increase on 2015), with the UK and Germany sitting at 300k and 400k. However, only in June this year, that number jumped to 18.8 million (worldwide), and Gartner Predicts 75% of U.S. households will have Smart Speakers by 2020, which means that 75% of 125.82 million households (94.3 million) will have voice-activated devices. To get to that number, the speed of growth in this market is set to accelerate at a fast rate. Now, while these statistics are all USA-centric, it’s not difficult to expect a significant uptake elsewhere.

It is for this reason that I have been pushing my clients to get into voice and get into it now; I am at the prototype stage with both Glaxo Smith Klien (GSK) and Thomas Cook, so I thought I would share what I believe is an excellent approach to the creation of an interface for yourself or your business.

First, think about the area in which a voice or messaging assistant could help.

A few example areas:

·     Customer service: help and support desk

·     Data processing: search, read, interpret, and large amounts of data

·     Finance and accounting: simple but common issues, such as lost cards and expenses tracking

·     Medical adviser: medical knowledge and getting advice

·     Personal concierge: organize your emails, diary, appointments, and access to other bots; Microsoft just did a deal with Google to do just this.

·     Online orders: Rather than completing online ordering forms, users can chat with or type to a bot to place their order.

Second, define what the user can expect to get out of the interaction.

If you already have a product, and you are looking to add to the ecosystem, think about which of its functions would benefit from voice and how those interactions that already exist could be improved. If you don’t have a product, write yourself some user stories: “As a ‘busy mum’ I want to ‘find what’s on nearby to entertain the kids’ so that I can get some sanity back.” I advise you to do this, even if you have a product.

Thirdly, have some role-playing workshops. In these, you can act out the interactions. Have one person represent the product and the other a human. Record the sessions so you have a transcript from which you can draw out the dialogue flow. Role-playing is a fun process in which you can invite multiple colleagues and users to get involved. You can then look for keywords that are defining the users’ intent and identify dead ends where you will need to introduce help.

Finally, consider whether you need a brand experience: how does your bot’s personality your brand? This is tricky, as often you may do well to ignore this as it can be a rathole that gets in the way of launching something. However, it’s fair to say that copywriting is vital.

As a sense check, make sure you are taking into account Grice’s Maxims:

The maxim of quantity: where one tries to be as informative as one possibly can and gives as much information as is needed and no more.

The maxim of quality: where one tries to be truthful and does not give information that is false or that is not supported by evidence.


The maxim of manner: when one tries to be as clear, as brief, and as orderly as one can be in what one says, and where one avoids obscurity and ambiguity.


The maxim of relation: where one tries to be relevant and says things that are pertinent to the discussion.


In short:

·     Be true

·     Be brief

·     Be clear

·     Be relevant

Then, you only need to build it, and I suggest going with Amazon Alexa first; it has some great tutorials, code, the most substantial user base, and some handy analytics tools to analyze how things are working out.

Of course, if you would like some help, get in touch!


Leave a Reply

Your email address will not be published. Required fields are marked *