Lessons from the voice design front line

Will Merrill, senior designer at Smart Design, explains what he learned designing an Echo Dot-based voice activated personal assistant for a lady suffering from MS.

The tech giants’ battle for in-home voice assistant domination is not going anywhere soon. With Amazon, Apple and Google all vying for space in our homes to help us manage our lives using marketing campaigns that portray a natural, seamless experience, you could be led to believe that using these products is a simple, life changing task – but is this true?

With 2017 research from VoiceLabs suggesting that there’s only a 3 per cent chance, on average, that a person will be an active user of one of these devices by week two after purchase, something’s not right.

I was lucky enough to be part of the team working on an episode of BBC2’s Big Life Fix, which aired last night, during which we worked with a lady called Susan living with a progressive form of MS. Using a pair of Echo Dots as the foundation, we created an ‘accessibility jacket’ – a personalised menu system – so Susan could perform tasks we all take for granted such as changing the television channel and making a telephone call. We also made it into the form of her favourite animal, an owl.

Before we landed on using the Alexa platform, we had to research what was out there. I wanted to share what I learned in the hope it might be useful for people embarking on their own voice projects.

How they compare
Firstly, they all have the underlying tech nailed down. Detection and recognition – distinguishing between different people and tailoring the message – worked far better than I had expected. Take wake words for example, ‘Alexa’ has been chosen because it’s unique in its sound, distinguishable and easy to pick up. There are opportunities to change this (e.g. ‘Computer’ and’Echo’), but these have been carefully considered and Alexa far outweighs other choices in performance.

Early in our project, we considered building a bespoke solution, therefore allowing the use of our own, personalised, wake word, but soon realised how difficult that is to get right.

Getting the conversation to flow and feel natural is crucial if a brand wants to build trust and lower boundaries between the user and the technology. Although all the offerings worked pretty well, we found Google’s offering to have the best voice synthesis – its use of neural net machine learning allowed for a really natural sound – so this platform could be a good option for a more human experience.

Apple offers the best visual experience, which might sound counterintuitive, but visualisations can be very helpful for the user.

However, across the board, we still feel like the ‘clever AI helper’ these devices are being sold as is not yet a reality. Although the tech is incredibly powerful, and the conversation feels natural, it doesn’t yet feel intelligent. Amazon offers over 30,000 different ‘skills’ (in the US) and most of them have no real benefit when compared to their smartphone equivalent and often feel like a novelty. With voice assistants, we are in a situation reminiscent of the early days of app stores on our mobile devices.

Personalisation will improve each offering
We’re seeing an influx of products using machine learning and AI to deliver more personalised experiences, and we feel that personalisation offers huge potential for this paradigm. If a voice assistant could learn more about you and your likes and dislikes, it could surface the right content at the right time, amplifying that feeling of an intelligence and usefulness – but which tech giant will get there first?

For us, with limited time and resources, we decided to borrow from the well-established digital format of the drop-down menu to overcome the issues of relevance and personalisation. By creating an audible menu system, a hand-hold, we helped Susan filter to the key tasks she wanted. This might sound limiting, but for someone with MS and sometimes experiencing ‘foggy’ moments, it was a perfect solution.

We also had the opportunity to personalise the physical components and created an owl – Susan’s favourite animal and something totally relevant to her. By using something that resonated with her, we could humanise a technology that is otherwise very alien to her. The answer may not always be an owl, but the physical components should try to provide some context to the intended interaction, which we believe are lacking from the current suite of products. Apple and Google products feel like they should live in the home, but that’s where it ends.

Voice UI is deceptive
100 per cent voice control is a myth. There is a visual element that is currently fundamental to the interaction. There are of course lights and non-verbal clues. For Project Susan we amplified the visual side of if it and the whole product changes colour at different times. The owl has an idle state glow, almost like it’s breathing, but when Susan says the wake word, the owl looks as if it’s awake and listening.

We’ve noticed more brands incorporating some sort of screen in to their products. Apple has a full colour display on its Homepod, which affords a higher level of nuance to its interaction and visualisation. If you compare the original Amazon Echo to the Apple Homepod, you will see how each iteration allows a lot more of those non-verbal cues.

Be prepared: mental models are backwards
When designing an app, you have a clear mental model that guides navigation, without too much cognitive load from the user – there are layers and sub-layers that guide you through. With voice control, navigation is obscured from the user, and often relies on them having the end goal in their head and working backwards. This needs to be reflected in the design, and currently means that functionality tends to be stripped back and simplified.

Don’t use a voice platform for the sake of it
I wouldn’t be a good designer if I didn’t finish the article with a cry for help – if you’re thinking of using a voice platform, don’t use it for using the platform’s sake. Understand your audience and ensure this platform is right for them, which will result in creating something meaningful. As designers, we have a real opportunity to provide valuable meaning to these assistants but we’re still trying to work out where the technology can add real benefit for the user.

If you are going down the voice path, make sure you assemble the right team. Voice isn’t just about tech; it’s also about visual design and industrial design. Creating something that is essentially invisible, and works perfectly, is no small task.n.