On May 4, T-Mobile USA announced a new
addition to its T-Mobile myTouch 3G smartphones, the T-Mobile myTouch 3G Slide,
based on Android 2.1 software, and available in June. The new model combines a
slide-out QWERTY keyboard with a touch screen (pictured below). It also comes with a very complete voice
interface pre-installed (no downloads
necessary), developed by Nuance Communications. The feature is activated
by a “Genius button,” a physical button on the front of the phone. One briefly
pushes the button briefly to activate the Genius screen (shown below), which
contains some of the keywords that can be used to activate specific options;
the system is listening for a command and does automatic end-of-speech
detection when the user stops speaking.
The extensive set of speech features can fairly
be called a Voice User Interface for the phone, and is a significant expansion
of the role of speech technology in a mobile phone. Despite some of the speech
recognition being performed in the network, there are no premium download fees
or recurring monthly charges. Michael Thompson, senior vice president and
General Manager, Nuance Mobile, said, “T-Mobile is truly taking advantage of
our connected and embedded natural language capabilities to bring an incredibly
unique Android device and one-of-a-kind consumer experience to market.”
Thompson noted that a number of aspects contribute to the effectiveness and
usability of the speech interface: the pre-load of the application; its
ubiquitous availability because of the Genius button (which allows it to be
used for dictation into any text field); its mix of embedded and network speech
without the user being aware of the difference; and its “optimization for the
specific device.” In addition, he credited T-Mobile’s naming of the Genius
button and its advertising of the feature as encouraging the use of speech.
Further, the speech recognition accuracy is enhanced by the two-microphone
noise-cancelling capability built into the phone.
The speech features
T-Mobile summarizes the features of the Genius
button as “You press one button, talk, and it delivers.” The physical button
is, of course, always available, so the Genius feature becomes, in effect, part
of the operating system, and the feature is available for all screens. Speech
can be used for making calls, sending emails and text messages (including
dictating those messages), searching the Web, finding businesses and
directions, and more. The voice commands are interpreted by their content,
rather than having to navigate to a specific application first. For example,
one can press the Genius button and say, “Send Text to John Doe. I’ve got
tickets to the Angels game tonight, want to join me?” The result is that the
SMS app is launched with the text of the message inserted. Other examples
include:
- “Call
Alex Jones at Home” (activates dialing);
- “Search
for Recipes for Chocolate Cake” (activates Google
search);
- “Find Sushi Restaurants near me” (search
again);
- “Find Directions to 1 Wayside Drive,
Burlington, Massachusetts” (activates Google Maps);
- “Show me the Calendar.”
Other features can be
activated by intuitive commands that are exemplified in the material delivered
with the phone.
The dictation technology uses Nuance’s Dragon
Dictation to create text by voice for email, a text message, calendar, Twitter
application, or Web browser (for search). Dragon Dictation uses the core
speech technology found in the PC-based Dragon NaturallySpeaking dictation
software. The PC-based software has features such as an enrollment to make the
speech recognition tuned to the speaker. The mobile version does adapt to the
user over time (both language and acoustic models, Thompson said), but doesn’t require
(or allow) enrollment.
Like the PC version, one has to dictate
punctuation, if desired: “Hi comma Jim exclamation mark I’m looking forward to
our lunch meeting Thursday period Is Fred joining us question mark.” Thompson
said that users seem to want to have this control, rather than dealing with the
results of automated punctuation.
The application is listening after a brief
press of the Genius Button and continues to listen until there is a relatively
long pause. Nuance indicated that they employ relatively conservative auto end-pointing
so the listening window stays open when the user pauses for a short duration,
as one might do in the process of dictating a long email. When the end-pointing
kicks in, the application assumes the message is complete, and the text or
email client is opened automatically with both the To: and Body: fields
pre-populated with the recognition results. However, the user can then use
Dragon Dictation from within the text or email client to dictate more text into
the body of the message.
Nuance text-to-speech is also available to read
incoming messages. The phone can be set up to announce an incoming message,
which can be read in its entirety if so instructed by a voice command and
replied to by voice if desired.
The MyTouch 3G Slide also includes MyAccount
3.0, a customer service application based on Nuance Mobile Care, which provides
users with the ability to easily access and manage their T-Mobile account
information directly from their phone using in part an application on the phone
itself. Users can open MyAccount directly from the Genius Button by
saying “minutes” or “My Account.”
The business model
Nuance licenses embedded speech software for
many phones, but since the speech features in the myTouch 3G Slide uses
network-based recognition as well, there is a continuing expense for Nuance
beyond the software on the phone. Thompson said that the financial arrangement
between Nuance and T-Mobile to support this expense can’t be revealed, but that
the deal does take the continuing expense into account. He also noted that, in
general, it is not Nuance’s practice to enter into a deal that isn’t intended
to be profitable.
Competing user interface concepts
The completeness of the voice interaction may
get lost, to some degree, on the myTouch 3G Slide, with its many features to
make using typing easier (with a slide-out keyboard as well as a touch-screen
keyboard with the Swype fast-typing application pre-loaded, see figure at the
end of the article). Other features, as discussed below, make touch-screen navigation
easier. If the device had less features to overcome the limitations of the
small screen and features a text box as the central navigation feature, then an
emphasis on “say or type what you want” would allow the speech features to be
more central to the device. In a potentially confusing option, as with any
Android phone, one can put a One-Touch Google Search widget on the home
screen that supports voice or text search, with the voice search using Google
speech technology in the network.
The issue is illustrated by the T-Mobile press
release, which features the Genius button as the third of three features
highlighted, after the Faves Gallery and myModes. The Faves Gallery allows
communicating more easily with the people you communicate with most, with quick
access and automatic updates to alternative modes of communicating with those
people. The myModes feature allows one to have different highly customizable
home screens for different uses of the phone, most obviously, work and
personal. These screens can change modes based on pre-set times or locations
(or manually). This task-oriented approach (where several related programs can
be activated from one tile based, e.g., on a contact rather than a function)
differs from the iPhone (which basically has an application focus, navigate to
a specific app before you do anything), and is more similar to the approach
Microsoft is using in its upcoming Windows Phone 7 mobile phone operating
system. The Swype fast-typing mode, is
another touch-oriented feature competing with the voice interface. (This
application isn’t from Nuance, but they have a similar application that uses
their T9 predictive-text technology in part.)
In summary, most reviews will probably focus on
these Graphical User Interface variations, treating the voice features as a
secondary hands-free option. Nuance offers the voice interface as a general
offering called Nuance Voice Control
2.0 (which integrates embedded and network-based recognition as well as
text-to-speech for message reading), so we may see simpler phones adopt the
solution, where it will be more clearly dominant.
Nuance
Voice Control is a hosted solution that utilizes Nuance VSuite as an embedded
client solution. VSuite includes features such as voice dialing and is
available separately for all feature phone and smartphone platforms; it is
already utilized by most device manufacturers and many mobile operators.