Monday, Feb. 15, 1999

When Computers Do the Grading

By NADYA LABI

Teachers have it made. They get to send students to detention, assign homework and give tests. They can issue commands like "compare and contrast" and watch their charges squirm. There is but one drawback to wielding such power: the daunting task of grading essays. For every student who pulls an all-nighter wading through Great Expectations, there is a teacher who has to slog through dozens of tortured expositions on the symbolism of Miss Havisham's aborted wedding feast.

That may soon change. America's most relentless examiner, the Educational Testing Service, has developed computer software, known as E-Rater, to evaluate essays on the Graduate Management Admission Test. Administered to 200,000 business school applicants each year, the GMAT includes two 30-min. essays that test takers type straight into a computer. In the past, those essays were graded on a six-point scale by two readers. This month, the computer will replace one of the readers--with the proviso that a second reader will be consulted if the computer and human-reader scores differ by more than a point.

It's one thing for a machine to determine whether a bubble has been correctly filled in, but can it read outside the lines, so to speak? Well, yes and no. E-Rater "learns" what constitutes good and bad answers from a sample of pregraded essays. Using that information, it breaks the essay down to its syntax, organization and content. The software checks basics like subject-verb agreement as well as recognizes words, phrases and sentence structures that are likely to be found in high-scoring essays. For example, an essay on Clinton's impeachment trial that includes terms like DNA and rule of law is likely to garner a top grade.

Of course, the machine cannot "get," say, a clever turn of phrase or an unusual analogy."If I'm unique, I might not fall under the scoring rubric," concedes Frederic McHale, a vice president at the Graduate Management Admission Council, which owns the GMAT. On the other hand, E-Rater is mercilessly objective and never tires halfway through a stack of essays. The upshot: in pretrial tests, E-Rater and a human reader were just as likely to agree as were two readers. "It's not intended to judge a person's creativity," says Darrell Laham, co-developer of the Intelligent Essay Assessor, a computer-grading system similar to E-Rater. "It's to give students a chance to construct a response instead of just pointing at a bubble."

That won't reassure traditionalists, who argue that writing simply can't be reduced to rigid adjective plus subject plus verb formulations. "This is all part of a long-term approach to mind as machine," says David Schaafsma, professor of English education at Teachers College of Columbia University. "Writing is a human act, with aesthetic dimensions that computers can only begin to understand." The Kaplan course, a leader in test prep, has taken a more pragmatic approach: it has issued a list of strategies for "the age of the computerized essay." One of its tips: use transitional phrases like "therefore," and the computer just might think you're Dickens.

--By Nadya Labi