Monday, Aug. 10, 1992

The Machines Are Listening

By Philip Elmer-DeWitt

In its television debut on ABC's Good Morning America, Casper the talking (and listening) computer was everything one would expect of a digital servant -- friendly, eager to please but slightly hard of hearing. Morning host Joan Lunden, demonstrating Casper's capabilities on an Apple Macintosh computer, was able to persuade the system to program her VCR simply by talking into a microphone -- although she had to repeat "Casper, accept program!" several times before the machine finally got the message. When the technology is perfected, say Apple executives, computers will be able to act on their human masters' every command, whether it be to pay the phone bill, schedule a lunch date or fetch the electronic mail.

But the technology looked a lot less cute and friendly to telephone operators in June as AT&T installed similar systems in Seattle and in Jacksonville, Florida. The phone company has developed a computer system that can recognize words like "collect" and "person to person" about as well as any human. By the end of 1994, when AT&T is scheduled to finish deploying the new equipment across the U.S., it plans to close 31 offices and eliminate up to one-third of the jobs now held by 18,000 long-distance operators.

Although still in its infancy, the technology that enables machines to understand human speech may ultimately have as much impact on the way people do their work -- and whether some of them still have work -- as any advance since the computer. The ultimate goal of speech-recognition researchers -- what they call their Holy Grail -- is an automatic dictation machine that can listen to normal conversational speech and turn it into perfectly typed text. Such a system could carry out much of the work currently done by millions of human typists, transcribers, reporters, secretaries and stenographers.

Automatic typewriters are probably still decades away. But there has been rapid progress in the underlying technology during the past few years, and even with the severe limitations of today's equipment there are now voice- recognition systems doing real tasks -- and in some cases replacing real workers -- at hundreds of sites across the U.S. Among those tasks:

SORTING MAIL. The U.S. Postal Service uses voice-recognition systems in 30 big postal centers to sort bundles that cannot be processed by its automatic equipment. A human reads the ZIP codes off the labels, and the system directs the packages to the proper chute. The Postal Service figures it is cheaper to buy a computer to do the job than to train people to memorize which ZIP codes correspond to which locale.

AUTHORIZING TRANSACTIONS. American Express has 500 human operators to field calls from retailers who do not have electronic equipment to get approvals. These employees verbally authorize 2.5 million charge-card transactions a month. Some of the authorizations are now being given by computers that ask for account numbers and purchase prices and then check cardholders' accounts automatically.

TRADING STOCKS. Stockbrokers trading U.S. government securities at 40 sites and six major brokerage houses can now bark their buy and sell orders into special telephones and see their trades instantly recorded on computer screens at their desks. Similar systems are being used by quality-control inspectors on factory lines, by doctors filling out medical reports and by lawyers putting together paragraphs of boiler-plate prose.

Voice recognition has come a long way in 20 years from the primitive systems that had to be trained to each individual's voice and could recognize words only when they were spoken one at a time. The most advanced systems today look not at whole words but at phonemes, the building blocks from which all words are constructed. That makes it possible to decode the slurred sentences that most people speak. The systems also use mathematical techniques to meld dozens of sampled voices, including male and female tones, so that the computers can recognize phrases spoken by just about anybody.

The main limitation on such systems is that they can deal with only relatively small vocabularies -- usually a few dozen words at a time. But that's enough to take orders at fast-food restaurants or to handle toll-free calls in which a customer must choose from a fixed list of catalog items, airline flights or bank transfer options. More than $150 million worth of voice-recognition systems were sold in the U.S. last year, according to Voice Information Associates, a research firm in Lexington, Mass., and the market is growing more than 40% a year. The big breakthrough will come when computers that can follow conversational speech become sufficiently powerful to handle vocabularies of 20,000 words. That would cover 97% of the words used in today's books, magazines and newspapers.

Researchers argue among themselves about whether it will be five or 10 or even 20 more years before dictation systems are that smart. For court reporters, stenographers or anyone else whose primary job is to put spoken words onto paper, that time might be well spent figuring out how to adapt to the technology -- or, if that's not possible, looking for a new career.

With reporting by Sam Allis/Boston