Vox Generation (VoxGen), which says it
has the largest speech team in the UK, has developed what easily qualifies as
one of the most innovative telephone speech applications and one of the most
difficult, called Ring ‘n Sing—an automated karaoke contest. The application
appears to have significant revenue-generating potential both for VoxGen and its
customers. It works like this: A caller calls in and picks a song from a list
using speech recognition. The caller may be looking at the list on a Web site,
but can have it spoken. The music is played without lyrics, and the caller
attempts to sing along (the lyrics are also on the Web site). When they are
done, the software scores how well they compare in matching pitch throughout the
song to a reference by a professional singer, with a score ranging from 1 (did
you really sing?) to 10 (try out for American Idol). Callers can have their
rendition replayed, and, if they wish, place it on the Web site for talent
scouts or potential fans to check out.
VoxGen combined Nuance speech
recognition with internally developed software for scoring the singing. Tim
Morgan, VoxGen Chief Technical Officer, said that the algorithm uses pitch
tracking to see if the caller is in tune. (Note that trying to spoof the system
with a recording of the original song will not work because the background music
on the recording will cause problems with the pitch tracking.) The music played
for the caller is over the earpiece, and the singing comes through the
microphone channel, thereby avoiding this problem by using the echo-cancelling
built into the telephone system. Morgan said that the project required
significant data collection and analysis to get the proper statistical
distributions for scoring.
The system was first deployed in The
Netherlands, mostly in English, since the popular songs used were in English.
Vocability, a provider of hosted speech applications in the Benelux
countries, handled the VoiceXML application, using the VoxPilot platform
with 200 dedicated ports. A Dutch landline company provided the promotion,
charging for the service and tying it to a popular amateur-singing-contest
television show which has a Dutch version (Dutch Idols, of course). Ring ‘n Sing
works for any voice and any song in any language, although the speech
recognition part requires a correct language version for the song selection.
VoxGen also had to create administrative software that would let the application
managers change songs and get statistics through an easy-to-use interface.
Morgan said that, in The Netherlands, 77% of
the callers called more than once, and 37% called five or more times. The
average duration of a call was three minutes. Simon Loopuit, CEO at VoxGen,
said, “There are a variety of business models that can benefit from a speech
recognition solution, and this is an example of how the technology is
increasingly being used beyond simple transactional procedures. The potential of
the technology is more advanced than people realize.”
You can try the application yourself at +44 870 350 2560.
Make sure you know the lyrics, or you may get the response—“That was horrible!
You got one out of ten.”