From Telephone Strategy News -
April 2004
IBM
announces new software, tools and technology for “conversational access”
Includes
call flow builder and additional support for outside speech technologies
IBM has offered its
telephone speech products with a model similar to that of the Microsoft
Speech Server, pricing its speech recognition on a per-server pricing model and
offering discounts based on the total amount of IBM software purchased. The
Pervasive Computing group, which until recently was a separate business unit,
has now moved into the mainstream software group at IBM and is making more
progress in educating the large IBM sales force about its speech solutions. IBM
has about 12,000 sales representatives and about 3,000 people in its consulting
group supporting Customer Relationship Management, according to Gene Cox,
director of product and solutions management, IBM Pervasive Computing.
The company is also
positioned similarly to Microsoft in that it is motivated to integrate its
solutions into its overall WebSphere architecture and sell an integrated
solution for the Web and telephony. Gary Cohen, general manager, IBM Pervasive
Computing, gave a keynote speech at AVIOS~SpeechTEK. He noted, “Companies are
now demanding that their contact centers be integrated with their current IT
infrastructure. At the same time, developers want speech middleware based on
open standards so they can use common application development tools to integrate
speech into their existing business processes.” IBM’s middleware supports
VoiceXML for voice-only applications, and the company has proposed a multimodal
specification, X+V, that it considers an extension of VoiceXML.
On March 23 at
AVIOS~SpeechTEK, IBM announced new software, tools, and technologies
aimed at “conversational access.” Built on open standards and based on Java, the
product offerings are aimed at allowing businesses and developers to integrate
speech into existing infrastructures more efficiently, rather than treating
speech or multimodal applications as a separate silo. In a related announcement,
Opera Software announced a new multimodal browser for PCs incorporating
IBM’s embedded speech (p. 15).
Cox said that
“conversational access” meant more than just a cost reduction tool in call
centers. He noted that serving customers is a cross-channel horizontal process
that uses multiple means to contact customers. IBM’s objective is to use open
standards and middleware to “allow call center systems out of their ‘silo.’”
From IBM’s strategic view, telephone access is just part of the “on-demand”
environment espoused by the company as an integrating vision: “An on-demand
business is an enterprise whose business processes—integrated end-to-end across
the company and with key partners, suppliers, and customers—can respond with
speed to any customer demand, market opportunity, or external threat.”
Product enhancements
IBM announced its plans for
future versions of WebSphere Voice Server to support Linux as well as MRCP
(Media Resource Control Protocol), a proposed standard aimed at easing the
integration of speech recognition and text-to-speech engines. MRCP is designed
to enable additional vendor platforms to support WebSphere Voice Server. IBM
already supports the VoiceXML Gateway from VoiceGenie and the Nuance
Voice Platform with its WebSphere Voice Application Access software for call
centers. Steve Cawn, call center sales team leader, Americas, said, “If a
customer chooses IBM’s speech technology, we can integrate it tightly with our
tools and provide a little extra. But we’re a very customer-driven organization.
If a customer wants a speech technology other than IBM’s, we try to support the
customer.”
IBM announced other
upgrades to its speech portfolio, with tools built on Eclipse-based WebSphere
infrastructure software. These include new versions of Voice Toolkit for
WebSphere Studio, WebSphere Voice Application Access, and WebSphere EveryPlace
Multimodal Environment for Embedix (for mobile Linux environments) and the
Pocket PC.
Voice Toolkit for
WebSphere Studio
The toolkit, a plug-in for
WebSphere Studio Application Developer, is an integrated write, run, and deploy
environment for building speech applications. The enhanced tools now include a
call flow builder to streamline design. It aims to help non-IT staff, such as
business analysts, describe a speech application’s desired process of
interacting with the caller prior to involving the developer.
Cawn indicated that the
tool now provides easier implementation of dialogs when the user can specify
multiple pieces of information, and may do so in a flexible manner. The tools
will also be able to test the end-result more easily, simulating a more
realistic user experience. The toolkit also includes CCXML (Call Control
eXtensible Markup Language), a proposed standard that allows VoiceXML
applications to manage telephony infrastructure.
WebSphere Voice
Application Access
Voice portals give end
users a single point of personalized, speech-driven interaction with content and
transactions. WebSphere Voice Application Access, IBM’s core call-center
application, will now run on WebSphere Portal 5.0.2, with additional support for
AIX 5.2. This upgrade will feature improved customization options for portal
administrators, easing day-to-day support and manageability. Based on IBM’s
WebSphere portal framework, WebSphere Voice Application Access allows developers
to use a consistent, standards-based architecture, which in turn is designed to
helps companies reuse existing security protocols, back end interfaces, and
business logic.
WebSphere EveryPlace
Multimodal Environment
Multimodal, X+V-based
versions of handheld browsers from ACCESS and Opera now support IBM’s
Embedded ViaVoice speech recognition. This can enable developers to extend
enterprise applications to devices combining speech with graphics and other
forms of input and output in the same interaction. As noted, Opera Software also
recently announced a similar browser for Windows, enabling multimodal functions
on the desktop. Igor Jablakov, lead, multimodal, IBM, said that promising areas
for multimodal applications include customer-facing applications for service
providers, financial, and telematics sectors; and internal applications in
healthcare and industrial sectors.
Customers
Among the new customers
using IBM’s speech middleware is Parcelforce, the parcel delivery arm of
the UK’s Royal Mail. Parcelforce is using speech technology to allow users to
get information on their packages and to schedule deliveries. The automation
allows around-the-clock service and will save about $3.5 million per year on an
investment of less than $2 million.
Dennis Marine, vice
president for information services, Prudential Financial, another of
IBM’s recent customers, spoke as part of the IBM keynote at AVIOS~SpeechTEK. The
speech system, with Viecore as a system integrator, handled calls
requesting information on 401K retirement plans. Cohen of IBM indicated that
Prudential experienced a 7% increase in automated capture rates while handling
3.3 million calls per year. Marine pointed out that the deployment was “not a
technology, but a business application.”
Cohen mentioned UK customer
Dail-a-phone, which was able to automate with speech the difficult task
of capturing UK postal codes, which are alphanumeric. He indicated that the
application saved 30 seconds of agent time per call.
Research in speaker authentication
At AVIOS~SpeechTEK, David
Nahamoo, department group manager, IBM, demonstrated a speaker authentication
application now in research. The application integrated biometric voice
characteristics with information the caller should know. Nahamoo said in a
separate interview that the goal of the research was to reduce the false
acceptance rate to near zero while maintaining an acceptable false rejection
rate by incorporating knowledge-based questions. Other research efforts include
adding expressiveness to text-to-speech (e.g., happy, authoritative) and speech
recognition that is more accurate than human speech recognition in difficult
environments.