TMA Associates

From Speech Strategy News, May 2009


Tellme offers mobile and hosted services for consumers and enterprises
Microsoft subsidiary improves interactive speech solutions


On April 29, Tellme, a Microsoft subsidiary, made two announcements: one supporting speech recognition on mobile phones and the other advances in Tellme's speech-automated hosted customer service ("cloud-based voice services"). The customer service announcement included an option for VoIP-based telephone service though a partnership with Global Crossing, in effect putting Microsoft in the telephone service business (see Editor's Notes, p. 5). Tellme has pre-negotiated discounted rates with Global Crossing for any Tellme customers who choose to set up inbound service to Tellme with Global Crossing; organizations can transfer existing toll-free numbers or local numbers to Global Crossing.


The intent of the service for mobile phones is to "allow people to press one button, say what they want and get it," with voice search, dictation of text messages, and voice dialing by name. The announcement is for a version for Windows Mobile 6.5, although Tellme is currently offering a beta version for RIM's Blackberry on the Tellme web site. Tellme will be available for free on Windows Mobile 6.5 phones in September 2009 when the phones hit store shelves. The application will Initially available in the Windows Marketplace for Mobile store. Tellme will also provide the service at no cost to mobile operators and carriers to embed on-device so a user can use the application without a download.


Perhaps an underlying message of the two Tellme announcements is that Tellme and Microsoft, which have been relatively quiet about their speech initiatives, are moving toward more active promotion of their capabilities in this area. In their enterprise hosting business, the company cites being able to tune the speech technology using more than two billion calls per year from more than 40 million callers per month (experience which can translate to more accuracy on consumer applications as well), yet most Microsoft analysts report as if the Tellme subsidiary doesn't exist. Expect Tellme to talk more about the advantages of their "pay-as-you-go" model in enterprise applications—no capital expenditures or software maintenance costs, no periodic upgrade fees (but seamless custom upgrades to improve completion rates), no waste on overestimated capacity needs or scaling for peak loads, and freeing up support staff for other projects.


Background
Microsoft acquired Tellme in March 2007 for an estimated price of $800 million (SSN, April 2007, p. 1). Tellme based its services on using speech technology to provide services in two areas:
(1)    Consumer-focused applications, with a long-standing free service available from any telephone, 1-800-555-TELL, as well as hosting the Microsoft free directory assistance phone number, 1-1-800-CALL-411. 555-TELL is not well known; callers can do a search for a business, get stock quotes, sports scores, weather, driving directions, movies, horoscopes, soap opera digests, taxi services, airlines, hotels, rental cars, lottery results, get the time, news, and even play vocal blackjack. Marcello Typrin, director of product management and planning, said that Tellme hasn't broadly advertised the service because they found that consumers didn't want to dial a 10-digit number, but wanted a one-button solution, such as the new mobile service Tellme/Microsoft is introducing.


(2)    Hosted customer service applications, on-demand Interactive Voice Response automating customer calls. Companies that use Tellme to host some or all of their automated customer service applications include American Airlines (SSN, October 2008, p. 7), Domino's Pizza (SSN, March 2008, p. 12), Orbitz (p. 35), E*TRADE, Merrill Lynch, and UPS. Some independent companies that develop voice-enabled customer service applications [e.g., Gold Systems (SSN, August 2008, p. 12), SpeechCycle (SSN, February 2009, p. 1), and TuVox, SSN, April 2009, p. 12)] use Tellme as a hosting option for their customers as part of Tellme's partner program.
Tellme originally used Nuance speech recognition (from the original Nuance Communications before ScanSoft acquired the company and took its name). Tellme's deal with the original Nuance apparently gave it volume pricing and some ability to tune the speech recognition; the current Nuance Communications filed a patent infringement suit against Tellme in February 2006, which to this newsletter's knowledge, hasn't been settled. (If you are interested in the details, see the note at the end of this article.)


Tellme has added an option of using IBM speech recognition for its customers, and is converting to Microsoft's speech recognition and text-to-speech technology as a long-term policy. Typrin indicated that there are a number of motivations for the shift, including the ability to work with Microsoft Research directly on improvements, but that the decision was an easy one given that an internal test showed Microsoft speech recognition technology performing better than other technologies. (The mobile speech technology forming the other half of the announcement is completely Microsoft's, Typrin said.)


Improvements in the customer service platform
Tellme announced three speech technology and network innovations: (1) the roll-out of a VoIP carrier service that reduces customer transport costs; (2) improvements in core speech technology and development tools that improve automation of customer service calls; and (3) a new text-to-speech option for speaking specific forms of information with increased intelligibility and naturalness. The new speech services are a result of collaboration between Tellme and the Microsoft Speech Components Group. Murtaza Amiji. a product manager at Tellme, said that the company will continue to offer Nuance and IBM speech technology as an option, but that the new features are specific to Microsoft speech technology. The Microsoft speech technology will be priced typically on a per-minute basis, but can also be priced on a performance basis.
The technology improvements include:
•    Tuning of acoustic models, grammars, and phonetic dictionaries: Tellme and Microsoft's speech team used tuning data from Tellme's "billions and billions" of calls, as well as the company's design expertise, to develop new acoustic models, phonetic dictionaries, and grammar products that increase the accuracy of the speech recognition.
•    Real-time adaptation: An "online adaptation" capability enables the system to adapt to a caller's acoustic patterns within the first three seconds of speaking. The system also adapts specific caller speech patterns to the speaker-independent acoustic models.
•    Expanded multi-slot recognition: Multi-slot technology makes it possible for callers to ask for information in a full sentence or phrase, such as "I wanna buy five thousand shares of Coca Cola" and the system listens for the relevant words, in this case "buy," "five thousand shares" and "Coca-cola." Confidence scores are reported for the particular slots, such as the stock name. Then, if any information is missing or scores poorly, the system can ask just for the specific word clarification without re-prompting for the entire answer, and thereby avoid frustrating application designs that over-structure requests for information (with annoying confirmations for each step).
•    Specialized text-to-speech (TTS) voice: Tellme uses the general Microsoft TTS engine, but has developed a new custom TTS engine and "voice font" called Zira. The recorded voice database used to create the concatenative TTS speech specifically emphasized popular phrases and words used in customer service requests, such as street names, cities/states, business listings, and proper names), and the TTS voice is particularly natural and intelligible when speaking those items. Zira is a female, North American English voice. It is designed to be neutral, so that it can blend with other TTS engine voices and recorded prompts to speak the particular items it does well. The Zira technology benchmarks are close to actual human pronunciations, according to Microsoft. Zira runs on the Microsoft TTS engine that supports additional voices: male North Americna English, female French Canadian, and female Latin American Spanish.
•    Late binding of modular grammars: Murtaza Amiji, a product manager at Microsoft, said that grammars such as music lists or name lists that are updated often can be updated independently of the full recognition grammar, with the recognition engine using them in a modular fashion at runtime. Late-binding also makes it easier to personalize applications, since a caller-specific grammar module can be used; for example, if a caller wants to sell a stock, the grammar can be restricted to stocks in their portfolio. Late binding also enables SRGS grammars and SLMs to co-exist together in a single application.
•    Improved natural language support: Amiji said that the late binding can also be used in the Statistical Language Models supporting "natural-language" unstructured queries, such as in call steering applications. For example, a caller might be prompted, "Welcome to the travel reservation system. How may I help you?" and respond, "I'd like to fly from S F O to Boston departing September 27 and returning October 5, please." The software will then extract the pertinent parameters (that is an airline reservation rather than a train reservation, that the departure city is San Francisco, that the arrival city is Boston, as well as the departure and arrival date, with a separate confidence score for each piece of information). In addition to extracting multiple pieces of information from a single utterance, the SLM technology can be used to determine the purpose of a call, e.g., in response to a prompt such as, "Welcome to Acme insurance, how can we help you?" Responses can be as varied as, "I need to change my home address," "I wanna open a savings account," "I'd like to inquire about your auto insurance," or "I'd like to cancel my home insurance policy."
•    Integrated telephone service if desired with Global Crossing: For both toll-free and local services, the Tellme Platform supports blind and bridged transfers to destinations both on and off the Global Crossing network. To initiate a transfer, voice applications use the VoiceXML <transfer> tag and supply the required information. For both toll-free and local services, the Tellme Platform supplies network information from Global Crossing and/or the PSTN to running VoiceXML applications. This data is made available to VoiceXML applications through VoiceXML session variables, which include Called Party Number (also called "DNIS") and Calling Party Number Caller ID (also called "CPN" or "ANI").
Tellme continues support for open standards such as SRGS, SISR, PLS, and EMMA, to enable greater grammar flexibility and portability. The company's Tellme Studio development environment includes VoiceXML 2.1.
Results in early deployments of the improved recognition technology indicate an improvement of up to 2% in automated completion rates. With an average cost of $5 per customer-service call when handled by a live agent, a phone service handling 500,000 calls per day would save nearly $10 million per year with as little as 1% improvement in call automation.
The hosted solution provides a pay-for-usage model for enterprises, attractive in today's economy and also attractive to organizations with significant peaks in demand, e.g., retailers during a holiday season. The Global Crossing VoIP service can reduce telephone costs on customer service calls, lowering the average per-minute cost 60% per call and eliminating transfer fees, according to Tellme. Additionally, Tellme enterprise customers can replace expensive toll-free numbers with local number service; nation-wide caller plans on mobile phones lessen the need for consumers to use toll-free numbers. Having an alternative local number can save costs without affecting the consumer experience. Global Crossing and Tellme can provide less expensive local numbers for customer service.  
Jamie Bertasi, senior director of Business Solutions at Tellme, said, "Our goal is to give enterprises technology that improves the customer experience but also affords them the ultimate financial flexibility when deploying voice services. From initial deployments we're seeing impressive cost savings and results that we're sure our customers will be excited about."
The mobile phone application


3 of every 4 search queries are being initiated by voice on the Sprint Instinct
Due out this fall but available now for integration and testing, Tellme's new service for mobile phones uses a small application on the phone and the data channel to engage network-based applications and Microsoft speech technology. The new service allows many of the most popular phone functions to be activated with a single button press. While a beta version for BlackBerry phones is available on the Tellme web site, Marcello Typrin, Tellme director of product management and planning, said that the version for the upcoming Windows Mobile release has the advantage that they have been able to work with the Windows Mobile group to make the combination as efficient and effective as possible.
The basic functions activated by a single button press are:
•    Send a text message by saying "text" to open a text box, then speak the text message and "send" for anyone in their contact list.
•    Initiate a call by saying "call" and then the name of anyone in their contact list.
•    Conduct a search with Microsoft Live Search by speaking a category, such as "weather," "movies," or "pizza," or say "web search" for broader areas of interest. Web search inquiries that Tellme said should work include: "weather in San Francisco, California", "Pizza in Kansas City," and "Mother's Day gift ideas." The search function can be a source of revenue for the Microsoft service when it enables ads.
While the initial interface will be a one-step command that drops one into the mobile web application, Typrin indicated that the system can support a dialog for disambiguation of a request. He said that Microsoft wanted to be cautious about asking too many questions and potentially frustrating the user. In general, Microsoft is likely to add features over time, possibly including social networking and music search.
Windows Mobile 6.5 will also have the ability to support embedded speech recognition, Typrin said. Speech recognition on the device could be used to find device functions or initiate a call even when a data network connection was not available.
In a comparison test, Tellme found that it requires four touches, and more than 20 keystrokes to find a business with the Apple iPhone, while it only takes a button push and one verbal command to find the same business with Tellme. Tellme's research shows similar results for other tasks, such as making calls, sending text messages. and conducting searches for content like traffic, movies, news and sports.
In a recent study conducted by Sanderson Studios, more than 70% of respondents said that voice is superior to keypad or touch-based methods to perform some of the most popular mobile tasks. This includes looking up a business listing or location (78%), sending a text message (72%), placing a call (79%), getting information such as movies, weather, traffic or sports (77%), and getting directions (81%).
Dariusz Packzuski, senior director of consumer services at Tellme, said,
"Because it's so intuitive, we believe there is a real opportunity for voice to materialize as the leading user interface for the phone. By bringing voice access to calling, texting and searching together we reduce ‘menu surfing' on phones and make the convenience of voice more tangible for everyday needs…For example, Sprint has integrated our voice access to the Live Search application on Sprint Instinct phones and subscribers love it. In fact, we've seen impressive adoption of voice with 3 of every 4 search queries being initiated by voice."
Note: Patent infringement suit
In the 2006 suit, Nuance asserted that Tellme infringes their patents covering features of directory assistance and call center applications, including "whisper" technologies (in which the inquiry is recorded and, if it cannot be processed by a speech recognition system, passed on to an agent without requiring the caller to repeat the information) and advanced database query techniques (relevant in part to directory assistance applications). In the patent infringement suit, Nuance seeks monetary damages for Tellme's infringement as well as injunctive relief to prevent Tellme from continuing to infringe U.S. Patent No. 5,033,088 (granted originally to Voice Processing Corporation in 1991), entitled "Method and Apparatus for Effectively Receiving Voice Input to a Voice Recognition System," and U.S. Patent No. 6,256,630 (originally granted to Phonetic Systems in 2001), entitled "Word-Containing Database Accessing System for Responding to Ambiguous Queries, Including a Dictionary of Database Words, a Dictionary Searcher and a Database Searcher."


Copyright 2009 Speech Strategy News