To Jordi's page

CMPnetAsia
Search Archives By Issue|By Section
 

Hello computer


Dr Jordi Rober-Ribes , 1-Apr-2003

In Stanley Kubrick's 2001: A Space Odyssey, the homicidal HAL9000 interacts with humans through conversational speech. While it is 2003 and speech recognition is nowhere near that level of sophistication, it has certainly come a long way. Developing speech applications today no longer requires knowledge of complex signal processing, but rather how to integrate speech recognition into databases, larger operating systems, or the Internet. The time has come for companies to revisit speech recognition and its use.

Speech recognition converts spoken words into machine-readable form. The information is either processed and supplied as some sort of output to the user, or triggers an action such as transferring a telephone call to a specific number.

Speech synthesis does the reverse, converting machine-readable text to spoken words. Voice authentication technology, as its name suggests, examines who is speaking, rather than what is being said.

What's in it for you

Speech recognition saves costs by automating simple transactions that would otherwise consume valuable human-agent time.

It can also simplify information systems where touch-tone systems would be too complex. Staff attrition can also be reduced in call centres that use speech recognition systems.

Operational cost reductions of 90% are possible with speech recognition systems, where agent costs of between US$1 and US$7 per call can be reduced to US$0.10-0.70.

For instance, a wagering company in New South Wales, Australia, recently lowered cost per transaction from US$4.50 to US$0.40. The company receives most of its calls for horse-race betting, with an average of 80,000 calls per hour and peaks of up to 750 concurrent calls.

Such operational cost savings enable a quick payback of the capital costs required to build the system. Standard times to payback range from nine to 18 months. The betting company achieved payback in seven months.

Automating simple transactions. Switchboard automation is a good example of a simple transaction that can easily be automated by speech recognition.

The names of persons and departments can be put on a list of words to be recognised, so that a customer can ask for "Mr Lee", "sales", or "the shop in Newport" and be transferred to the correct extension. This frees human operators to deal with more complex transactions.

If you find touch-tone a pain, press one. Stockbrokers have been early adopters of speech recognition for providing stock prices to customers.

The advantage is clear, because the number of companies being traded in any single stock exchange makes touch-tone systems impossibly complex to use.

A speech-recognition system will simply instruct the user: "Please say the name of the stock of which you want to know the price." Simple quotes or catalogue browsing can also be automated in this way.

Order-tracking automation with speech recognition can provide benefits, even when that operation can be performed with a touch-tone system.

Normally, order tracking is done with an order number, which can be efficiently entered via touch-tone. However, using speech recognition can provide several advantages:

* It can be used easily with a mobile phone or while driving (e.g. truck drivers).

* It can automate the retrieval of an order number when the customer does not know it, by referring to the name and date of the order (e.g. "yesterday" and "last week").

Reducing staff attrition. Staff attrition can be reduced in call centres that use speech recognition to automate simple transactions.

This is because call-centre staff can then concentrate on the more elaborate transactions that afford them higher personal reward, reducing staff "burn" rates considerably.

Speech recognition will not solve all your call-centre problems, or automate all transactions. However, it can be extremely successful if the deployments are focused on what speech recognition does best.

Barriers to change

User adoption is often perceived as one of the main barriers to the adoption of speech recognition.

However, most experiences show that users prefer speech recognition to other types of interaction. It is important to keep focused on benefits to the user, rather than only on cost savings to the company.

For instance, most users will prefer interacting immediately with a speech-recognition system to waiting five minutes for a human agent. This is particularly so when the user needs only a simple transaction like an account-balance figure.

Recognition errors are still thought to be too high by some decision makers. A recognition error occurs, for example, when a system mistakes "nine" for "mine".

However, recent developments in speech-recognition engines have increased accuracy considerably.

Most importantly, good design practices in the dialogue design reduce the number of words that the system expects as valid answers to a question, and thus reduces the possibility of error in recognition.

For example, if the system asks the question "How many passengers will be travelling?", the expected answer will be a number. The system will not provide "mine" as an answer because it is not a number, but it will most certainly provide "nine".

Risk aversion on the part of management is the main hurdle that most speech-recognition projects need to overcome.

In the current environment, speech recognition technology is still perceived by some managers as too risky. This will change once more and more successful speech-recognition deployments are publicised.

In the meantime, speech-recognition projects should be implemented in internal trials and in small deployments to make managers aware of the maturity of the technology.

Script for success

Introducing a new technology into the workplace is always a testing endeavour, but not with proper planning.

Start small with a strategy for growth. It should start small, automating only the simplest transactions.

Only when such automation is successfully in place, and the company has understood the main challenges of speech recognition, should a more complex automation be tackled.

There have been too many implementations that have tried to automate too many things at the first stage, believing that "if you're going to do it, you'd better do the lot".

Most of them have failed to achieve the intended time frames, and blown costs in the complexity of the project.

Big companies should also avoid having multiple systems deployed by different departments. A common speech strategy should be developed to avoid having to train staff in using and maintaining multiple systems.
Moderating expectations. Over-expectation by users and staff can lead to disappointment.

There have been deployments where rural sales offices expected a much higher number of calls from the day a speech-recognition system was installed at headquarters.

As the volume of calls was exactly the same as before, the staff perceived the new system as a failure.

They should have been advised that the main difference to expect is that calls routed to the rural sales offices would be faster and cheaper.

Plan for longer design timeframe. Speech-recognition deployments require longer design times than standard IT deployments. The main reason is that extensive effort is needed in order to design the correct question-and-answer dialogues.

Plan for continuous maintenance. Speech-recognition systems are not "fire-and-forget" systems.

They need monitoring and fine-tuning, with customers' expected answers to each question added and refined constantly.

For instance, some of your customers may reply "car repairs" when asked "which department do you want to speak to?".

Your system might have only "repairs" on its list of possible answers but not "car repairs", so maintenance will need to add "car repairs" to the relevant list.

Speech-recognition systems should help your business prosper instead of blow you out of an airlock.

To achieve a successful deployment, you should decide if your business can benefit from speech recognition, then pace your implementation, and finally, review and refine diligently.

With this in mind, speech recognition will not end up being the death of you.

Dr Jordi Robert-Ribes is currently manager for R&D and Internet Services in the Technology and Planning group of SingTel-Optus.

Related Articles
  Seagate Technology
  Noble Group
  Pioneer Electronics Technology
  JD Edwards updates supply-chain software
  Manugistics revs flagship software and buys Western Data

 

 

CMPnetAsia Network Sites: CMPnetAsia | Asia Computer Weekly | Intelligent Enterprise Asia | Network Computing Asia | teledotcom

TechWeb Network Sites: Byte.com | CMPmetrics | ITpro Downloads | Financial Technology | InformationWeek | Insurance & Technology | Network Computing | TechXNY | TechCalendar | TechEncyclopedia | TechLearning | TechWeb News | TechWeb Today | Wall Street & Technology

Affiliate Network Sites: ChannelWeb | Communications Convergence | Computer Reseller News | DV.com | Game Developer| Imaging Magazine | Intelligent Enterprise | Network Magazine | Software Development | Sys Admin Magazine | UnixReview.com | VarBusiness | Web Review

Site Info: Media Kits | Newsletters | Subscriptions

Copyright© CMP Asia, Copyright © 2000 - Privacy Statement

To Jordi's page