Multimodal Mobile

Philippe Jeanrenaud, Director, Speech Mobility Marketing, EMEA at Nuance, looks at the trend towards offering multiple forms of text input and commands on mobiles

Nuance Philippe Jeanrenaud It seems as though the mobile device market has been evolving and growing at break-neck speed over the last few years, particularly with the advent of the iPhone. Were relying on our phones more than ever to not only make calls, but to send text messages and emails, download music, get directions to a coffee shop we just found by searching the web on the phone its amazing, and its only going to continue to get better with the downloadable application market rapidly heating up.
But just as we have an amazing pool of applications and services at our disposal to get our devices to do just about anything, so getting access to those apps faster and more easily becomes a key challenge for handset manufacturers and carriers. In addition to the clear need to ensure the user experience is the best one for their consumers, there is also  huge revenue potential at stake. Easier access to, and increased usage of, applications and services ultimately increases the average revenue per user (ARPU). And its all about ARPU, right?
Predictive text and speech recognition software are already playing a role in creating that path to the ultimate user experience, and many of the phones coming to market today have both predictive text for faster and easier texting, and basic speech capabilities like Voice Activated Dialling (VAD). But this calibre of implementation is just scratching the surface of how powerful a multimodal input approach can be for creating an amazing user experience that drives revenue and brand loyalty. And of course, there are a number of external factors, like distracted driver legislation, that clearly demonstrate the value of input options being fully integrated.

The keypad evolution 

The evolution of the phone keypad is a good example of how were beginning to see increased multimodality on mobile phones whereas most traditional consumer phones used to only include a 12-key phone pad, the full QWERTY keyboard made its way first onto the BlackBerry, and now onto more and more feature phones. Largely driven by texting and the evolution of lower-level devices being able to handle email, consumers demand faster and easier text input capabilities. With that in mind, weve seen many devices over the last few years that feature a 12-key phone pad on the main interface for easy dialling, and a slide-out QWERTY keypad for easy texting and emailing. Consumers to get the best of both worlds easy calls and easy texts.
But with advent of the iPhone came the touchscreen user interface (and of course, the app store phenomenon) and a new era in mobile phones was born, with touchscreen devices mow available from several handset makers.
In just the last two-three years, weve seen an evolution of modality on the keypad. And there is a clear market for both with mobile users addicted to the look and feel of the touchscreen, and those who wouldnt dream of giving up their hard QWERTY keypad. 

Moving beyond VAD

As mentioned earlier, many of todays phones have basic voice recognition capabilities for dialling numbers and finding contacts, and checking the status of your phones battery, eliminating the need to scroll through address books and menus. Voice input is perhaps the fastest and easiest input method, which is why there is an increasing adoption of expanded voice command and control capabilities. In fact, Datamonitor recently released a report stating that the market for advanced speech recognition technologies in mobile devices will triple over the next five years, rising from $32.7 million (19.7 million) in 2009 to $99.6 million in 2014. Were seeing this traction unfold, with many of todays higher-end feature phones now enabling users to access just about any application or service on the device with their voice. Looking to play BubbleSmile? Just say Go to BubbleSmile and youre there. Need to send a text message to John Smith? Just say Send text to John Smith, and the input field appears. 
This is where the multimodality of speech and text input are increasingly intersecting, and where the myth of the multimodal handsets is starting to become a reality. For instance, Samsung devices like the Instinct and the Memoir allow users to pull up the text input screen with their voice, and automatically bring them into a touchscreen QWERTY text input field that features predictive text. And of course, there are devices coming to market with full dictation capabilities, where you launch the text application with voice, dictate the message and then send it with a simple voice command no text input at all. This completely voice-enabled phone is not far from hitting the market, and the technology is here today, where consumers can surf the web, download music and get directions all via voice commands.

Complementary solution
So why the need for a complementary input solution when voice-driven access is the fastest and the easiest? Simply put, there are many situations where speech input isnt appropriate. If youre in a meeting, and you need to send a message to your boss that youre about to close a major deal, traditional texting is clearly more appropriate. Or if youre at a concert and want to download the new song your favourite band just played, youll likely have a hard time with the voice query in an incredibly noisy environment.
Similarly, there are many situations where text or manual input just doesnt work either especially while driving. Many governments across the globe are passing legislation that mandates hands-free control of phones as a means of thwarting the users ability to text while driving. And for good reason: a 2008 study conducted by the Technical University of Braunschweig in Germany found that utilizing speech to even dial a number on their phone improved the users ability to maintain the ideal car position by 19% when compared to manual dialling. Better still; speech input was also approximately 40% faster in making a call, reducing the distraction period by the same amount. With the fully speech-enabled handset, drivers have the ability to interact with their device as needed. 

Multimodal output
Todays mobile consumers are quickly realizing that multimodality is more than just input its also about output, and the ability to manage services. Voicemail to text services are perhaps one of the best examples of this, where voice messages are intercepted by an automated service or team of transcriptionists to transcribe a message that is then delivered straight to the users inbox, or sent as a text message. Gone are the days of having to listen to a voicemail, then write down a number or message, or worse, missing a voicemail if you cant access your phone its all completed in a matter of minutes for you and dropped into your inbox.
With regards to output, text-to-speech technology is increasingly being integrated into todays devices to confirm that an interaction with the phone succeeded or failed, read back text messages, confirm phone numbers and contact information, read aloud directions, and more. Text-to-speech integrated with speech input is the key to the truly hands-free device, where consumers are able to have a conversation with their phone to do virtually anything, such as download a song or get directions. This is one of the most exciting advancements in multimo
dality, that were going to see more of in the coming months.
Its also worth noting that similar output exists for the hands-on experience as well, with hard and touchscreen keyboards providing haptic feedback, such as a vibration or sound, to confirm that the interaction has taken place. 
Speech and text are increasingly becoming intertwined and integrated on the phone as part of the best possible user experience, that gives consumers the ability to access what they need when they need it, which in turn is driving increased brand loyalty and revenue opportunities for OEMs and operators.  
A key aspect of multimodality that cannot be lost, however, is keeping that power to choose input in the users hands. Each users experience is their own, so its important to not completely dictate their interaction with the device and what input method they need some will likely rely heavily on speech as more phones come to market with advanced capabilities. And of course, there are those who often find themselves in situations where text input is a must have. Its all about what the user is trying to achieve, and ensuring they have the utmost innovative input technology to get there.   

Future-forward multimodality
Multimodal input and output is an exciting area to watch, as additional areas of speech technology become closely integrated with mobile applications and services. For instance, imagine combining interactive voice response (IVR) technology with enhanced output on the device. One example is calling 1-800-FLOWERS, where the IVR options are enhanced on the screen of the device, providing pictures of the flower options with pricing information, or the ability to design your own flower arrangement right from your phone. To help ensure consumers continue using the service, 1-800-FLOWERS would remember what you ordered in the past, your preferences, etc., and enable the user to place similar orders easier and faster, using voice or text input, right from the phone.  
This scenario is a culmination of on and off-device applications, intertwined with a variety of input and output options that not only drive value for OEMs and operators, but also for the companies that sign on for them.
Theres no doubt that multimodality will continuously evolve, as it provides a richer user experience, enabling consumers to leverage the input and output methods that are best suited for them moment to moment.