Apple
makes speech understanding a core feature of future iPhones
San Jose, CA, October 4, 2011:
Today Apple announced its Siri personal assistant as a key feature of the
latest Apple iPhone, the 4S, announcing it as the most revolutionary and important of the features added to the new version of the operating system. Siri, which is both software on the phone and a
service within the cloud, uses speech recognition and understanding tightly
coupled with other phone services, such as scheduling reminders.
Speech recognition services
on mobile phones aren't new, with speech recognition for services such as
speaking search terms part of Google's Android and Microsoft's Windows Mobile
phone operating systems, as well as featured in independent apps from companies
such as Nuance and Vlingo. With mobile phones doing more and more and the small
form factor making typing and navigation inconvenient, it has become clear that
speech recognition will be a key user interface option for mobile devices. These
services seem on the surface to have many of the voice-enabled features that
Apple featured, so is Apple just playing catch-up? What's new?
A lot, according to Bill
Meisel, editor of Speech Strategy News and co-organizer of the Mobile Voice
Conference in March (www.mobilevoiceconference.com). First, he notes, Apple is
featuring more than speech recognition (which converts speech to a text
representation)—it is highlighting speech understanding, knowing what to do
with the speech content. Apple has always attempted to deliver solutions that
don't require a user manual, and allowing a user to say what they want in a
natural way is part of that philosophy. Siri's focus as a company before the
Apple acquisition was on natural language understanding; the company's
credentials as a spinout of SRI International (guess where "Siri"
comes from) suggest solid core technology. The speech recognition used by Siri
was from Nuance Communications, and almost certainly still is. (Nuance is
widely rumored to have some licensing deal with Apple that apparently will not
be announced.)
Second, the speech
understanding is tightly coupled with the Apple OS and applications and
services on the phone and in the network. By providing the phone and applications
delivered with the phone, Apple has an advantage in making the speech assistant
capable of doing what the user asks, e.g., reminding the user to do a specific
task when they leave home in the morning, combining GPS and a reminder program.
The speech understanding must find the most appropriate application or service to
respond to a user request, so it is integrated with Web-based services as well,
e.g., Wolfram Alpha.
Third, the app can be
cognizant of the user preferences and user-specific information such as contact
lists as a result of this tight integration. Presumably, Siri will use
information on what a user tends to request and from corrections users make to
improve its performance. It thus becomes over time a true "personal"
assistant. Apple makes it difficult for apps from outside developers to have
full access to some built-in apps and OS functions, making external apps less
able to have this tight integration.
Fourth, Siri uses "conversational"
speech. Many speech recognition applications today, e.g., voice search, allow
the user to say one phrase, and, after a slight delay, drop them into a
non-speech application, e.g., a list of web sites matching spoken search terms.
There are latency issues as the speech is transmitted over the network,
processed at servers, and the result returned. The examples given by Apple seem
to suggest that they have reduced latency sufficiently to allow more
back-and-forth interaction, although there is a big difference between a demo
and in-the-field experience.
Fifth, although it wasn't specifically
announced, the iPhone may have an advantage over some phones in handling speech
input. Previous versions have included a chip from Audience that has advanced noise-cancelling
features; this capability, if present, could allow using voice interaction in
environments where phones without noise-cancelling features can't.
Other companies offering
speech recognition solutions for mobile devices--e.g., Microsoft, Google,
Nuance, and Vlingo--have capabilities in natural language understanding, and
have featured some of these capabilities in their apps. Nuance's deal with
Apple, for example, while it will generate some licensing revenues, is probably
motivated to get mobile phone manufacturers to pre-load the company's Dragon Go!
app to get some of the same speech understanding features. Nuance also recently
launched a mobile developer's program that will let other app developers
incorporate Nuance's network-based speech recognition in their apps, a further
expansion of Nuance's business. And Microsoft and Google have already built
some speech recognition and understanding features into their operating
systems. Expect some comparisons of which are "smartest"!
One bottom line take-away,
Meisel noted, is that, given the market share of Apple's iPhone and the likely
competitive response from other vendors, speech understanding is now a key and
growing part of the user interface for mobile devices. This trend is the
motivation for the Mobile Voice Conference, the third year of which will take
place in San Francisco March 19-21, 2012.
About the Mobile Voice
Conference
The Mobile Voice Conference is
organized by the Applied Voice Input Output Society and Bill Meisel. It provides
attendees with information to help them take advantage of the rapidly
developing opportunities created by the explosion of mobile phone use, and, in
particular, with the increasing role of voice interaction on mobile devices,
including its implications for app development, enterprise use, and customer
service. The preliminary program and sponsorship opportunities are available at
www.mobilevoiceconference.com.
The first day of the
conference is the Vendor Showcase, part of the full conference registration,
but free for those attending the one day. Information on conference sponsorship
opportunities and participation in the Vendor Showcase is available at the
conference website.
About the Applied Voice Input
Output Society
AVIOS is non-profit
organization promoting the speech technology industry for over a
quarter-century. For more info, see www.avios.org.
Contacts:
AVIOS:
Peggie Johnson, 408-323-1783, Peggie@avios.org
TMA Associates: Bill Meisel,
818-708-0962, b.meisel@tmaa.com