|
In Stanley Kubrick's 2001: A Space Odyssey,
the homicidal HAL9000 interacts with humans through
conversational speech. While it is 2003 and speech recognition
is nowhere near that level of sophistication, it has certainly
come a long way. Developing speech applications today no
longer requires knowledge of complex signal processing, but
rather how to integrate speech recognition into databases,
larger operating systems, or the Internet. The time has come
for companies to revisit speech recognition and its use.
Speech recognition
converts spoken words into machine-readable form. The
information is either processed and supplied as some sort of
output to the user, or triggers an action such as transferring
a telephone call to a specific number.
Speech
synthesis does the reverse, converting machine-readable text
to spoken words. Voice authentication technology, as its name
suggests, examines who is speaking, rather than what is being
said.
What's in it for you
Speech
recognition saves costs by automating simple transactions that
would otherwise consume valuable human-agent time.
It
can also simplify information systems where touch-tone systems
would be too complex. Staff attrition can also be reduced in
call centres that use speech recognition
systems.
Operational cost reductions of 90% are
possible with speech recognition systems, where agent costs of
between US$1 and US$7 per call can be reduced to
US$0.10-0.70.
For instance, a wagering company in New
South Wales, Australia, recently lowered cost per transaction
from US$4.50 to US$0.40. The company receives most of its
calls for horse-race betting, with an average of 80,000 calls
per hour and peaks of up to 750 concurrent calls.
Such
operational cost savings enable a quick payback of the capital
costs required to build the system. Standard times to payback
range from nine to 18 months. The betting company achieved
payback in seven months.
Automating simple
transactions. Switchboard automation is a good example of a
simple transaction that can easily be automated by speech
recognition.
The names of persons and departments can
be put on a list of words to be recognised, so that a customer
can ask for "Mr Lee", "sales", or "the shop in Newport" and be
transferred to the correct extension. This frees human
operators to deal with more complex transactions.
If
you find touch-tone a pain, press one. Stockbrokers have been
early adopters of speech recognition for providing stock
prices to customers.
The advantage is clear, because
the number of companies being traded in any single stock
exchange makes touch-tone systems impossibly complex to
use.
A speech-recognition system will simply instruct
the user: "Please say the name of the stock of which you want
to know the price." Simple quotes or catalogue browsing can
also be automated in this way.
Order-tracking
automation with speech recognition can provide benefits, even
when that operation can be performed with a touch-tone system.
Normally, order tracking is done with an order number,
which can be efficiently entered via touch-tone. However,
using speech recognition can provide several
advantages:
* It can be used easily with a mobile phone
or while driving (e.g. truck drivers).
* It can
automate the retrieval of an order number when the customer
does not know it, by referring to the name and date of the
order (e.g. "yesterday" and "last week").
Reducing
staff attrition. Staff attrition can be reduced in call
centres that use speech recognition to automate simple
transactions.
This is because call-centre staff can
then concentrate on the more elaborate transactions that
afford them higher personal reward, reducing staff "burn"
rates considerably.
Speech recognition will not solve
all your call-centre problems, or automate all transactions.
However, it can be extremely successful if the deployments are
focused on what speech recognition does
best.
Barriers to change
User adoption is
often perceived as one of the main barriers to the adoption of
speech recognition.
However, most experiences show
that users prefer speech recognition to other types of
interaction. It is important to keep focused on benefits to
the user, rather than only on cost savings to the company.
For instance, most users will prefer interacting
immediately with a speech-recognition system to waiting five
minutes for a human agent. This is particularly so when the
user needs only a simple transaction like an account-balance
figure.
Recognition errors are still thought to be too
high by some decision makers. A recognition error occurs, for
example, when a system mistakes "nine" for
"mine".
However, recent developments in
speech-recognition engines have increased accuracy
considerably.
Most importantly, good design practices
in the dialogue design reduce the number of words that the
system expects as valid answers to a question, and thus
reduces the possibility of error in recognition.
For
example, if the system asks the question "How many passengers
will be travelling?", the expected answer will be a number.
The system will not provide "mine" as an answer because it is
not a number, but it will most certainly provide
"nine".
Risk aversion on the part of management is the
main hurdle that most speech-recognition projects need to
overcome.
In the current environment, speech
recognition technology is still perceived by some managers as
too risky. This will change once more and more successful
speech-recognition deployments are publicised.
In the
meantime, speech-recognition projects should be implemented in
internal trials and in small deployments to make managers
aware of the maturity of the technology.
Script for
success
Introducing a new technology into the
workplace is always a testing endeavour, but not with proper
planning.
Start small with a strategy for growth. It
should start small, automating only the simplest
transactions.
Only when such automation is successfully
in place, and the company has understood the main challenges
of speech recognition, should a more complex automation be
tackled.
There have been too many implementations that
have tried to automate too many things at the first stage,
believing that "if you're going to do it, you'd better do the
lot".
Most of them have failed to achieve the intended
time frames, and blown costs in the complexity of the
project.
Big companies should also avoid having
multiple systems deployed by different departments. A common
speech strategy should be developed to avoid having to train
staff in using and maintaining multiple systems. Moderating
expectations. Over-expectation by users and staff can lead to
disappointment.
There have been deployments where
rural sales offices expected a much higher number of calls
from the day a speech-recognition system was installed at
headquarters.
As the volume of calls was exactly the
same as before, the staff perceived the new system as a
failure.
They should have been advised that the main
difference to expect is that calls routed to the rural sales
offices would be faster and cheaper.
Plan for longer
design timeframe. Speech-recognition deployments require
longer design times than standard IT deployments. The main
reason is that extensive effort is needed in order to design
the correct question-and-answer dialogues.
Plan for
continuous maintenance. Speech-recognition systems are not
"fire-and-forget" systems.
They need monitoring and
fine-tuning, with customers' expected answers to each question
added and refined constantly.
For instance, some of
your customers may reply "car repairs" when asked "which
department do you want to speak to?".
Your system might
have only "repairs" on its list of possible answers but not
"car repairs", so maintenance will need to add "car repairs"
to the relevant list.
Speech-recognition systems should
help your business prosper instead of blow you out of an
airlock.
To achieve a successful deployment, you
should decide if your business can benefit from speech
recognition, then pace your implementation, and finally,
review and refine diligently.
With this in mind, speech
recognition will not end up being the death of
you.
Dr Jordi Robert-Ribes is currently manager
for R&D and Internet Services in the Technology and
Planning group of SingTel-Optus.
|