TMA Associates

From Speech Strategy News, June 2010

 

Nuance speech recognition app pre-loaded on T-Mobile Android-based myTouch

Combination of embedded and network-based speech recognition

On May 4, T-Mobile USA announced a new addition to its T-Mobile myTouch 3G smartphones, the T-Mobile myTouch 3G Slide, based on Android 2.1 software, and available in June. The new model combines a slide-out QWERTY keyboard with a touch screen  (pictured below). It also comes with a very complete voice interface pre-installed (no downloads necessary), developed by Nuance Communications. The feature is activated by a “Genius button,” a physical button on the front of the phone. One briefly pushes the button briefly to activate the Genius screen (shown below), which contains some of the keywords that can be used to activate specific options; the system is listening for a command and does automatic end-of-speech detection when the user stops speaking.

The extensive set of speech features can fairly be called a Voice User Interface for the phone, and is a significant expansion of the role of speech technology in a mobile phone. Despite some of the speech recognition being performed in the network, there are no premium download fees or recurring monthly charges. Michael Thompson, senior vice president and General Manager, Nuance Mobile, said, “T-Mobile is truly taking advantage of our connected and embedded natural language capabilities to bring an incredibly unique Android device and one-of-a-kind consumer experience to market.” Thompson noted that a number of aspects contribute to the effectiveness and usability of the speech interface: the pre-load of the application; its ubiquitous availability because of the Genius button (which allows it to be used for dictation into any text field); its mix of embedded and network speech without the user being aware of the difference; and its “optimization for the specific device.” In addition, he credited T-Mobile’s naming of the Genius button and its advertising of the feature as encouraging the use of speech. Further, the speech recognition accuracy is enhanced by the two-microphone noise-cancelling capability built into the phone.

The speech features

T-Mobile summarizes the features of the Genius button as “You press one button, talk, and it delivers.” The physical button is, of course, always available, so the Genius feature becomes, in effect, part of the operating system, and the feature is available for all screens. Speech can be used for making calls, sending emails and text messages (including dictating those messages), searching the Web, finding businesses and directions, and more. The voice commands are interpreted by their content, rather than having to navigate to a specific application first. For example, one can press the Genius button and say, “Send Text to John Doe. I’ve got tickets to the Angels game tonight, want to join me?” The result is that the SMS app is launched with the text of the message inserted. Other examples include:

-    “Call Alex Jones at Home” (activates dialing);

-    “Search for Recipes for Chocolate Cake” (activates Google search);

-    “Find Sushi Restaurants near me” (search again);

- “Find Directions to 1 Wayside Drive, Burlington, Massachusetts” (activates Google Maps);

- “Show me the Calendar.”

Other features can be activated by intuitive commands that are exemplified in the material delivered with the phone.

The dictation technology uses Nuance’s Dragon Dictation to create text by voice for email, a text message, calendar, Twitter application, or Web browser (for search).  Dragon Dictation uses the core speech technology found in the PC-based Dragon NaturallySpeaking dictation software. The PC-based software has features such as an enrollment to make the speech recognition tuned to the speaker. The mobile version does adapt to the user over time (both language and acoustic models, Thompson said), but doesn’t require (or allow) enrollment.

Like the PC version, one has to dictate punctuation, if desired: “Hi comma Jim exclamation mark I’m looking forward to our lunch meeting Thursday period Is Fred joining us question mark.” Thompson said that users seem to want to have this control, rather than dealing with the results of automated punctuation.

The application is listening after a brief press of the Genius Button and continues to listen until there is a relatively long pause. Nuance indicated that they employ relatively conservative auto end-pointing so the listening window stays open when the user pauses for a short duration, as one might do in the process of dictating a long email. When the end-pointing kicks in, the application assumes the message is complete, and the text or email client is opened automatically with both the To: and Body: fields pre-populated with the recognition results. However, the user can then use Dragon Dictation from within the text or email client to dictate more text into the body of the message.

Nuance text-to-speech is also available to read incoming messages. The phone can be set up to announce an incoming message, which can be read in its entirety if so instructed by a voice command and replied to by voice if desired.

The MyTouch 3G Slide also includes MyAccount 3.0, a customer service application based on Nuance Mobile Care, which provides users with the ability to easily access and manage their T-Mobile account information directly from their phone using in part an application on the phone itself.  Users can open MyAccount directly from the Genius Button by saying “minutes” or “My Account.”

The business model

Nuance licenses embedded speech software for many phones, but since the speech features in the myTouch 3G Slide uses network-based recognition as well, there is a continuing expense for Nuance beyond the software on the phone. Thompson said that the financial arrangement between Nuance and T-Mobile to support this expense can’t be revealed, but that the deal does take the continuing expense into account. He also noted that, in general, it is not Nuance’s practice to enter into a deal that isn’t intended to be profitable.

Competing user interface concepts

The completeness of the voice interaction may get lost, to some degree, on the myTouch 3G Slide, with its many features to make using typing easier (with a slide-out keyboard as well as a touch-screen keyboard with the Swype fast-typing application pre-loaded, see figure at the end of the article). Other features, as discussed below, make touch-screen navigation easier. If the device had less features to overcome the limitations of the small screen and features a text box as the central navigation feature, then an emphasis on “say or type what you want” would allow the speech features to be more central to the device. In a potentially confusing option, as with any Android phone, one can put a One-Touch Google Search widget on the home screen that supports voice or text search, with the voice search using Google speech technology in the network.

The issue is illustrated by the T-Mobile press release, which features the Genius button as the third of three features highlighted, after the Faves Gallery and myModes. The Faves Gallery allows communicating more easily with the people you communicate with most, with quick access and automatic updates to alternative modes of communicating with those people. The myModes feature allows one to have different highly customizable home screens for different uses of the phone, most obviously, work and personal. These screens can change modes based on pre-set times or locations (or manually). This task-oriented approach (where several related programs can be activated from one tile based, e.g., on a contact rather than a function) differs from the iPhone (which basically has an application focus, navigate to a specific app before you do anything), and is more similar to the approach Microsoft is using in its upcoming Windows Phone 7 mobile phone operating system. The Swype fast-typing mode,  is another touch-oriented feature competing with the voice interface. (This application isn’t from Nuance, but they have a similar application that uses their T9 predictive-text technology in part.)

In summary, most reviews will probably focus on these Graphical User Interface variations, treating the voice features as a secondary hands-free option. Nuance offers the voice interface as a general offering called Nuance Voice Control 2.0 (which integrates embedded and network-based recognition as well as text-to-speech for message reading), so we may see simpler phones adopt the solution, where it will be more clearly dominant.

Nuance Voice Control is a hosted solution that utilizes Nuance VSuite as an embedded client solution. VSuite includes features such as voice dialing and is available separately for all feature phone and smartphone platforms; it is already utilized by most device manufacturers and many mobile operators.