Monday, Jan. 11, 1999

Racing To Map Our DNA

By Michael D. Lemonick and Dick Thompson

When the Human Genome Project was launched a little under a decade ago, boosters compared it with the Manhattan Project or the mission to put men on the moon: an effort so complex and so broad in scope that only the government had the financial and bureaucratic resources to pull it off--yet with such huge potential payoffs that virtually no resources should be spared.

By the time the project was complete, promised its advocates, science would at last have access to the "book of life"--the precise biochemical code for each of the 100,000 or so genes that largely determine every physical characteristic in the human body. Once researchers knew that, they'd be able to figure out exactly how each gene functions--and, more important, malfunctions to trigger deadly illnesses from heart disease to cancer.

Important as it was, the job would take some time. Unlike the atom bomb or the space race, there was no Hitler or Khrushchev who threatened to get there first. Without such external dangers forcing them to pull out all the stops, federally funded genome-project scientists figured they could move at their own pace; they would finish up in 2005 or thereabouts.

They figured wrong. The Nazis and the communists may be history, but an even more electrifying force has arisen to put the fear of God into the genome project: the profit motive. Pharmaceutical companies stand to make incalculable billions of dollars by turning genome research into new treatments for a dizzying array of diseases. And the companies that manage to get the information first--and lock up what they find with patents--will profit most (see box).

It's no surprise, therefore, that private firms have plunged into human-genome projects of their own. Nor is it surprising, given the potential payoff, that their scientists have found ways to speed up the decoding process. Indeed, one such company--Celera Genomics Corp., led by maverick scientist Craig Venter (see following story)--declared last spring that it would have the job substantially wrapped up in three years.

Blindsided by Venter's surprise announcement, leaders of the federal genome project--which is being carried out at university and government labs in the U.S., at the Sanger Centre near Cambridge, England, and at facilities in Germany and Japan--spent the summer rethinking their schedule. The result: an announcement last fall that they would finish up by 2003 rather than 2005, with a rough "working draft" of the genome to be published by 2001.

The measured march to decode the human genome, in short, has turned into a headlong horse race--and the rivalry isn't always polite. The federal genome project, critics carp privately, has been shockingly mismanaged and is sorely lacking in vision. Private efforts, counter some in the public project, are pirate operations that seek to lock critical segments of God's genomic handiwork behind a barricade of patents. Beyond that, they say, speeding up the pace of discovery could lead to slapdash, incomplete results. "If this is the book of life," sniffs Francis Collins, director of the National Human Genome Research Institute, in Bethesda, Md., and one of the leaders of the federal Human Genome Project, "we should not be satisfied with a lot of mistakes or holes."

Completeness and accuracy were the Human Genome Project's twin mantras from its formal start in 1990. At that point, researchers had already painstakingly identified more than 4,000 of the 100,000 genes that serve as the blueprint for a functioning human being--each gene carrying instructions that tell cells how to produce a specific protein. Scientists had located about 1,500 genes, in a rough way, on the 46 chromosomes--the long, twisted strands of DNA cradled in protein at the heart of every human cell. But they had deciphered, or sequenced, only a handful of the many-hundred-word "sentences" that each gene represents--sentences made up of three-letter "words" built in turn from four available molecular "letters," represented by A, T, C and G.

The project's $3 billion mandate: sequence the entire 3 billion-letter human genome with high precision as a prelude to figuring out eventually what protein each gene produces and for what purpose (see diagram). The process can be likened to mapping out a route from San Francisco to New York City by walking the entire distance and noting every hill and valley along the way. It's slow but precise. After eight years, some 7% of the human genome has been sequenced in encyclopedic detail.

But while the genome project has been methodically chronicling the details of human cells--including long stretches of DNA, amounting to some 97% of the total, that contain no genes at all--private companies have opted for a very different approach. Their maps are more like satellite photographs that take in the entire route but concentrate only on the highlights. "The thing people are highly interested in," says Randal Scott, president and chief scientific officer at Incyte Pharmaceuticals, based in Palo Alto, Calif., one of the players in the private-sector gene-mapping game, "is where all the cities are. You don't need to document all the trees and gullies and ditches." Once those landmarks are identified, scientists assume, they can focus on them in greater detail.

Scott's rivals at Genset, based in France, are taking a similar approach: their map, to be completed in early 2000, will highlight just 60,000 of some 10 million biochemical "beacons" found along the human genome. By comparing the DNA of many individuals in and around these signposts, Genset hopes to pick out specific genes whose malfunctions actually cause disease. It has already begun to work. Using this technique, says Genset chief genomics officer Dr. Daniel Cohen, the company has found two different genes involved in prostate cancer. Cohen points out that the 20 most common diseases, which kill about 80% of the population, are probably linked to some 200 genes out of the body's 100,000. It only makes sense, he says, to look first at those genes.

As narrowly focused as their efforts are, Cohen and Scott are using gene-mapping techniques that are not very different from the Human Genome Project's. Craig Venter, on the other hand, has taken a radical approach, one that resembles paper shredding more than it does mapmaking.

Venter's reputation as a creative thinker was made back in the late 1980s. He was studying genes at the National Institutes of Health when he came to a humbling realization: while the greatest minds in biochemistry still hadn't figured out how to locate a gene efficiently, cells do it all the time. Cells, moreover, tap into only those genes they need and ignore the rest.

That was fine with Venter, since the strips of DNA that are actually being used as blueprints for constructing a protein are where the action is. So Venter decided to concentrate on these active parts. He focused on the so-called messenger RNA, or mRNA, which ferries instructions from DNA over to the cell's protein-making machinery. This is the essence of the gene, and it was these stripped-down genetic instructions--copied into a more stable form known as cDNA--that he fed into an automated gene sequencer he'd acquired for his lab.

Decoded cDNA began tumbling out of his machine. A portion of these decoded regions were used as tags--he called them expressed sequence tags (ESTs)--to help scientists distinguish one gene from another and identify related genes even in other species. "His invention of ESTs was inspired," says Victor McKusick, a geneticist at Johns Hopkins University who is often called the father of genetic medicine. In June 1991, when Venter published his first paper based on this work, scientists had identified only about 4,000 genes, each one representing years of painstaking labor. In one day, Venter added 347 new genes to the list. Soon he was finding 25 a day.

Officials at the National Institutes of Health were delighted that one of their own had struck the mother lode, and they rushed to patent Venter's genes. But across the NIH campus, James Watson, who had won a Nobel for his co-discovery of the structure of DNA and who was then running NIH's Human Genome Project, was outraged. This wasn't science, he insisted. "Virtually any monkey" could do that work, Watson fumed in the opening salvo of a battle that would rage for months--and which smolders to this day. To patent such abbreviated genetic material, said Watson, was "sheer lunacy" that would entangle genetic research in legal issues and slow it to a crawl. When the battle was over, the NIH had withdrawn the patent proposal and Watson was no longer head of the genome project. Gone too were Venter and his wife and collaborator, Claire Fraser.

Freed from the confines of the NIH, Venter took an offer from a venture capitalist to head his own research facility, which he named The Institute for Genomic Research--TIGR, or "tiger." The private sector gave him the resources to find genes as fast as he could.

But in 1994 Johns Hopkins Nobelist Hamilton Smith challenged Venter to do more. At the time, Venter was using a technique called shotgunning. In essence, shotgunning amounts to putting DNA into a chemical Cuisinart. High-frequency sound waves shred the long stringy molecule into tiny fragments. The fragments are cloned in bacteria, and then, following what has become standard gene-mapping procedure, the bugs are ripped open and their DNA is run through a gene-sequencing machine.

But because the original DNA has been torn into so many random bits of genetic gibberish (as opposed to the predictable fragments made by gene-cutting enzymes), scientists need powerful computers to determine where the tiny fragments overlap. This is tough enough when you're sequencing a small part of a chromosome. But now Smith urged Venter to try it out, not merely on a strip of DNA but on an entire genome. He proposed Haemophilus influenzae, a bacterium that causes ear infections and meningitis. Until then, only a few small viruses, whose genomes had tens of thousands of genetic letters, had been entirely decoded. H. flu had 1.8 million.

The audacious proposal was quickly denied federal funding. Venter and Smith pushed ahead anyway--and within a year they had succeeded. The publication of their 1995 paper in Science was a landmark that galvanized researchers. For the first time, the genetic secrets of an entire living organism had been exposed.

Today, four years later, a total of 20 genomes have been fully decoded, 10 of them at TIGR. In December scientists at Washington University in St. Louis, Mo., and at the Sanger Centre passed a new milestone by decoding the first animal genome, that of a tiny roundworm, Caenorhabditis elegans. At 97 million letters, C. elegans' genome is by far the most sophisticated ever sequenced. But if Venter's newly formed Celera (derived from the word celerity, which means swiftness) can pull it off, his proposal to shotgun the entire 3 billion-letter human genome in three years will make the roundworm's DNA look downright puny.

Venter admits that whole-genome shotgunning will leave gaps in the sequence where segments can't be fitted perfectly. But as he points out, traditional sequencing leaves holes as well. Like the government's gaps, his can be filled in later--and fast. "Let's say there are 50,000 holes averaging 83 letters each," he says. "At the rate we plan to clone and sequence DNA, we could close those in a day."

But many scientists believe that Venter won't be able to complete the genome-reassembly process. They liken the job to taking a year's worth of issues of a magazine like this one, chopping the pages into one-line fragments, then trying to put the fragments back together without a single typo. As daunting as that seems, imagine that up to 30% of the text consists of nearly identical strings of words up to 7,000 letters long. Assembling these "repeat sequences," says the genome project's Francis Collins, is "a challenge to anyone who doesn't break it down into bite-size pieces."

Whether or not Venter succeeds in putting his Humpty Dumpty genome back together again, his basic premise, shared by the competition at Genset and Incyte, remains compelling: you don't need the entire genome mapped to high precision to make big advances. Cohen's discoveries of prostate-cancer genes are one example. Similarly, the National Center for Biotechnology Information, part of NIH's National Library of Medicine, is using databases of partial gene sequences to zero in on genes that make aberrant proteins in ailments like Parkinson's disease.

Meanwhile, the threat of being upstaged by Venter has put enormous pressure on the Human Genome Project. During a previously scheduled project review last summer, the directors did a thorough re-evaluation of their procedures, soliciting advice from the scientists doing the actual mapping. In the end, the message was clear. Says Collins: "We heard from the users that our current degree of accuracy wasn't needed for many of their strategies."

So the Human Genome Project was recast. Completion was pushed up from 2005 to 2003. And while project scientists had previously been unwilling to release data until they were of high quality, the administrators announced that they would offer up a "working draft" of only moderate precision by 2001. Says Mark Guyer, an assistant director with the NIH's National Human Genome Research Institute: "These data are so rich, it's hard not to extract value from them." But, he admits, "it would not have happened had it not been for the Celera announcement."

Venter wasn't finished, though. Last month it was revealed that the U.S. Department of Energy, whose labs are part of the federal project, was negotiating with Venter to let him do part of the job for it. The cost to the government: zero. That proposal was put on ice by project leaders, supposedly because the DOE had contracted with Venter without checking with other project members, and also out of fear that the release of information to the public might be delayed. Unofficially, it's clear that sour grapes over Venter's latest triumph played a role in their decision.

Whether it's Venter or the government or some sort of public-private partnership that eventually finishes the job, all the genome mappers agree that once the gene sequence is complete, the next step will be to look into how genes vary from one person to the next. In most diseases, it is probably a conspiracy of several genes and environmental factors that result in illness or death. Through its human-variation project, the NIH hopes to identify genes and sets of genes that only nudge people toward a particular disease.

"This will be our most powerful tool," says Collins. "Finding these weak-susceptibility genes will be moderately useful for predicting risk, but they will be far more useful in allowing us to see the real molecular basis of diseases--all diseases--whether it's multiple sclerosis or brain tumors or diabetes." The truth is that no one can predict exactly what breakthroughs might result from the deciphering of the human genome. As Venter puts it: "It's like it was before electricity. No one could have envisioned personal computers back then."

And for that reason, it's probably just as well that both efforts, public and private, are proceeding in parallel. "The public sector is learning how to produce very high-quality data," says Maynard Olson, director of the University of Washington Genome Center, which is part of the federal project. "You'll never see private companies doing that." If private companies focus first on the most intriguing genes, while government-sponsored scientists sequence the rest, everybody will profit in the end.

--With reporting by Dan Cray/Los Angeles, Andrea Dorfman/New York and Kate Noble/Cambridge

With reporting by Dan Cray/Los Angeles, Andrea Dorfman/New York and Kate Noble/Cambridge