TMA Associates

Speech Technology News and Analysis

 

TMA Home
Newsletter
Conferences
Consulting
Resources
Press Releases
Contact Information
Web Subscribers

From Telephone Strategy News - April 2004

IBM announces new software, tools and technology for “conversational access”

Includes call flow builder and additional support for outside speech technologies

IBM has offered its telephone speech products with a model similar to that of the Microsoft Speech Server, pricing its speech recognition on a per-server pricing model and offering discounts based on the total amount of IBM software purchased. The Pervasive Computing group, which until recently was a separate business unit, has now moved into the mainstream software group at IBM and is making more progress in educating the large IBM sales force about its speech solutions. IBM has about 12,000 sales representatives and about 3,000 people in its consulting group supporting Customer Relationship Management, according to Gene Cox, director of product and solutions management, IBM Pervasive Computing.

The company is also positioned similarly to Microsoft in that it is motivated to integrate its solutions into its overall WebSphere architecture and sell an integrated solution for the Web and telephony. Gary Cohen, general manager, IBM Pervasive Computing, gave a keynote speech at AVIOS~SpeechTEK. He noted, “Companies are now demanding that their contact centers be integrated with their current IT infrastructure. At the same time, developers want speech middleware based on open standards so they can use common application development tools to integrate speech into their existing business processes.” IBM’s middleware supports VoiceXML for voice-only applications, and the company has proposed a multimodal specification, X+V, that it considers an extension of VoiceXML.

On March 23 at AVIOS~SpeechTEK, IBM announced new software, tools, and technologies aimed at “conversational access.” Built on open standards and based on Java, the product offerings are aimed at allowing businesses and developers to integrate speech into existing infrastructures more efficiently, rather than treating speech or multimodal applications as a separate silo. In a related announcement, Opera Software announced a new multimodal browser for PCs incorporating IBM’s embedded speech (p. 15).

Cox said that “conversational access” meant more than just a cost reduction tool in call centers. He noted that serving customers is a cross-channel horizontal process that uses multiple means to contact customers. IBM’s objective is to use open standards and middleware to “allow call center systems out of their ‘silo.’” From IBM’s strategic view, telephone access is just part of the “on-demand” environment espoused by the company as an integrating vision: “An on-demand business is an enterprise whose business processes—integrated end-to-end across the company and with key partners, suppliers, and customers—can respond with speed to any customer demand, market opportunity, or external threat.”

Product enhancements

IBM announced its plans for future versions of WebSphere Voice Server to support Linux as well as MRCP (Media Resource Control Protocol), a proposed standard aimed at easing the integration of speech recognition and text-to-speech engines. MRCP is designed to enable additional vendor platforms to support WebSphere Voice Server. IBM already supports the VoiceXML Gateway from VoiceGenie and the Nuance Voice Platform with its WebSphere Voice Application Access software for call centers. Steve Cawn, call center sales team leader, Americas, said, “If a customer chooses IBM’s speech technology, we can integrate it tightly with our tools and provide a little extra. But we’re a very customer-driven organization. If a customer wants a speech technology other than IBM’s, we try to support the customer.”

IBM announced other upgrades to its speech portfolio, with tools built on Eclipse-based WebSphere infrastructure software. These include new versions of Voice Toolkit for WebSphere Studio, WebSphere Voice Application Access, and WebSphere EveryPlace Multimodal Environment for Embedix (for mobile Linux environments) and the Pocket PC.

Voice Toolkit for WebSphere Studio

The toolkit, a plug-in for WebSphere Studio Application Developer, is an integrated write, run, and deploy environment for building speech applications. The enhanced tools now include a call flow builder to streamline design. It aims to help non-IT staff, such as business analysts, describe a speech application’s desired process of interacting with the caller prior to involving the developer.

Cawn indicated that the tool now provides easier implementation of dialogs when the user can specify multiple pieces of information, and may do so in a flexible manner. The tools will also be able to test the end-result more easily, simulating a more realistic user experience. The toolkit also includes CCXML (Call Control eXtensible Markup Language), a proposed standard that allows VoiceXML applications to manage telephony infrastructure.

WebSphere Voice Application Access

Voice portals give end users a single point of personalized, speech-driven interaction with content and transactions. WebSphere Voice Application Access, IBM’s core call-center application, will now run on WebSphere Portal 5.0.2, with additional support for AIX 5.2. This upgrade will feature improved customization options for portal administrators, easing day-to-day support and manageability. Based on IBM’s WebSphere portal framework, WebSphere Voice Application Access allows developers to use a consistent, standards-based architecture, which in turn is designed to helps companies reuse existing security protocols, back end interfaces, and business logic.

WebSphere EveryPlace Multimodal Environment

Multimodal, X+V-based versions of handheld browsers from ACCESS and Opera now support IBM’s Embedded ViaVoice speech recognition. This can enable developers to extend enterprise applications to devices combining speech with graphics and other forms of input and output in the same interaction. As noted, Opera Software also recently announced a similar browser for Windows, enabling multimodal functions on the desktop. Igor Jablakov, lead, multimodal, IBM, said that promising areas for multimodal applications include customer-facing applications for service providers, financial, and telematics sectors; and internal applications in healthcare and industrial sectors.

Customers

Among the new customers using IBM’s speech middleware is Parcelforce, the parcel delivery arm of the UK’s Royal Mail. Parcelforce is using speech technology to allow users to get information on their packages and to schedule deliveries. The automation allows around-the-clock service and will save about $3.5 million per year on an investment of less than $2 million.

Dennis Marine, vice president for information services, Prudential Financial, another of IBM’s recent customers, spoke as part of the IBM keynote at AVIOS~SpeechTEK. The speech system, with Viecore as a system integrator, handled calls requesting information on 401K retirement plans. Cohen of IBM indicated that Prudential experienced a 7% increase in automated capture rates while handling 3.3 million calls per year. Marine pointed out that the deployment was “not a technology, but a business application.”

Cohen mentioned UK customer Dail-a-phone, which was able to automate with speech the difficult task of capturing UK postal codes, which are alphanumeric. He indicated that the application saved 30 seconds of agent time per call.

Research in speaker authentication

At AVIOS~SpeechTEK, David Nahamoo, department group manager, IBM, demonstrated a speaker authentication application now in research. The application integrated biometric voice characteristics with information the caller should know. Nahamoo said in a separate interview that the goal of the research was to reduce the false acceptance rate to near zero while maintaining an acceptable false rejection rate by incorporating knowledge-based questions. Other research efforts include adding expressiveness to text-to-speech (e.g., happy, authoritative) and speech recognition that is more accurate than human speech recognition in difficult environments.