TMA Associates

Speech Technology News and Analysis

 

TMA Home
Newsletter
Conferences
Consulting
Resources
Press Releases
Contact Information
Web Subscribers

VoxGen develops automated, over-the-telephone karaoke contest

 

Application a success in initial deployment in Belgium

Vox Generation (VoxGen), which says it has the largest speech team in the UK, has developed what easily qualifies as one of the most innovative telephone speech applications and one of the most difficult, called Ring ‘n Sing—an automated karaoke contest.  The application appears to have significant revenue-generating potential both for VoxGen and its customers. It works like this: A caller calls in and picks a song from a list using speech recognition. The caller may be looking at the list on a Web site, but can have it spoken. The music is played without lyrics, and the caller attempts to sing along (the lyrics are also on the Web site). When they are done, the software scores how well they compare in matching pitch throughout the song to a reference by a professional singer, with a score ranging from 1 (did you really sing?) to 10 (try out for American Idol). Callers can have their rendition replayed, and, if they wish, place it on the Web site for talent scouts or potential fans to check out.

VoxGen combined Nuance speech recognition with internally developed software for scoring the singing. Tim Morgan, VoxGen Chief Technical Officer, said that the algorithm uses pitch tracking to see if the caller is in tune. (Note that trying to spoof the system with a recording of the original song will not work because the background music on the recording will cause problems with the pitch tracking.) The music played for the caller is over the earpiece, and the singing comes through the microphone channel, thereby avoiding this problem by using the echo-cancelling built into the telephone system. Morgan said that the project required significant data collection and analysis to get the proper statistical distributions for scoring.

The system was first deployed in The Netherlands, mostly in English, since the popular songs used were in English. Vocability, a provider of hosted speech applications in the Benelux countries, handled the VoiceXML application, using the VoxPilot platform with 200 dedicated ports. A Dutch landline company provided the promotion, charging for the service and tying it to a popular amateur-singing-contest television show which has a Dutch version (Dutch Idols, of course). Ring ‘n Sing works for any voice and any song in any language, although the speech recognition part requires a correct language version for the song selection. VoxGen also had to create administrative software that would let the application managers change songs and get statistics through an easy-to-use interface.

Morgan said that, in The Netherlands, 77% of the callers called more than once, and 37% called five or more times. The average duration of a call was three minutes. Simon Loopuit, CEO at VoxGen, said, “There are a variety of business models that can benefit from a speech recognition solution, and this is an example of how the technology is increasingly being used beyond simple transactional procedures. The potential of the technology is more advanced than people realize.”

You can try the application yourself at +44 870 350 2560. Make sure you know the lyrics, or you may get the response—“That was horrible! You got one out of ten.”

from Bill Meisel’s Telephone Strategy News, April 2006