Johns Hopkins Magazine - June 1995 Issue

The Tool of Tools

...One that unlocks proteins, and therefore the genetic code. Hopkins researcher George Rose has learned how proteins "read" the genetic code of DNA, to create the very stuff of life.

By Elise Hancock

George Rose, a 55-year-old professor of biophysics and biophysical chemistry at the School of Medicine, tries to damp down his grin, to be sober and scientific, but the grin keeps peeking out his eyes. That's because Rose and a postdoctoral fellow, Rajgopal Srinivasan, have developed a computer algorithm called LINUS that can predict the backbone folding, which implies the atomic structure, of an important group of proteins, the globular ones.

Ah, sweet mystery of life - - whatever that is, it has to involve proteins, for without proteins, there would be no life; they are fundamental. "Everything that is done," says Rose, "all the tasks that are done in living systems, are done by proteins. Really, we are little bags of protein, plus instructions for making more."

Proteins are what DNA encodes, and they include enzymes and hemoglobin; skin, bone, and brain; antibodies and silk; cell receptors and hooves. Even viruses have protein coats. Without proteins, no muscle moves, no leaf turns sunlight into energy, no DNA can do anything. For if DNA is a tape, proteins are the cassette player. Each gene encodes a protein in a process that involves transcription (copying a DNA strand onto an RNA strand), followed by translation (synthesizing the string of amino acids specified by the RNA). Voila! a protein.

Of all the more than 100,000 or so different kinds of proteins in the world, each one folds up its chain of amino acids in its own unique way, which literally determines what the protein can do. For example, when the enzyme lysozyme folds, its surface loops create a mini-crater that is just right for sugar molecules, chemically and spatially. In contrast, hemoglobin folds to make narrow clefts that are just right to hold hemes (which hold iron, which holds oxygen, which the hemoglobin ferries from your lungs to your littlest toe). In the case of proteins, then, form does not follow function. Rather, form-how a protein folds-is function.

The central question has been, Why does hemoglobin always fold like hemoglobin, lysozyme like lysozyme? The answer had to be in DNA, by virtue of its coding the particular sequence of amino acids that make up the protein. Still, understanding DNA does not explain how amino acids direct the way each protein folds.

That is the "protein folding problem," which Rose and Srinivasan seem to have solved. They have filed for a patent on LINUS, and the first article on their work appears in the June issue of PROTEINS. Their findings were unveiled to Hopkins colleagues for the first time at a studiously low-key lecture on March 29.

Response was anything but low-key, however, as Rose threw up slide after slide showing LINUS's predicted structures vs. those known from X-ray crystallography-near-perfect matches, one picture after another, for five different proteins, each of a different folding type. Up till now, the best algorithm could predict, 70 percent of the time, which kind of mini-structure a particular amino acid would be part of, within the protein. Nobody could predict entire proteins.

Watching Rose's slides go by, most of the top geneticists at Hopkins were leaning forward, slack-jawed. They shook their heads in amazement and emitted little sighs, then looked at each other to shake their heads again and share the wonder. No one left throughout the hail of questions, which lasted for an hour.

"The guy is being over-modest," was the later assessment of Nobel laureate Hamilton Smith, professor of molecular biology and genetics at Hopkins. "Folding has been considered to be the holy grail-the major big theoretical problem that nobody thought we'd solve in this century, and maybe not in the next. This is an absolute breakthrough. This is light-years ahead."

Rose says that LINUS's "biggest immediate implication is for the genome data base (GDB)," an international compilation of almost all the known amino acid sequences in the human genome. "Now all that information is useful," says Rose. He hopes that within a year most genetic researchers will have LINUS, and will be posting LINUS pictures along with their new sequences.

Already the pictures are good enough, says Eaton (Ed) Lattman, director of the Hopkins Center for Biophysics and Macromolecular Assemblies, that you "know which amino acids are approximately where on the surface, once the protein's finished folding." That's valuable because proteins bind through their surfaces; parts that fold inside are not available for interaction.

Once the GDB becomes a picture book, Lattman says researchers will be able to "use these pictures for whatever they want-to see whether their molecule looks like somebody else's molecule. Or to examine a molecule's different functional sites" (such as the four clefts in hemoglobin that hold four hemes). "You'll see where sites are and how they're related. Sometimes molecules do more than one thing." Rose adds, "You could solve all the proteins on p53," a gene that permits cancers to develop when it mutates. "See if the structure explains anything."

That kind of fishing expedition hasn't been practical until LINUS, because NMR and X-ray crystallography have been the only ways to solve sequences for structure. And although they yield wonderfully clear images-"LINUS aspires to such detail," says Rose-the process is painfully slow and labor-intensive. "It used to take a world-class lab five years to solve a protein. It still takes many months, often more than a year." Of course, no sensible person would invest such effort unless the protein's function was already known, and known to be important.

It's no wonder that only a thousand protein structures have been solved. "Most molecules are still mysteries," Rose says. "We have the sequence but not the structure." LINUS promises much quicker and cheaper structures, soon to be done in a day or an hour.

At last, then, it may be possible to understand the functional behavior of a wide range of molecules-perhaps even to deduce the "rules" behind their folding. "We would like to know those rules," says Rose. "Just as Kepler's laws predict the motion of the planets, so you know when the next eclipse will come..." he gazes into space for a minute, uncharacteristically leaving the sentence incomplete. "If we understood the rules," he says at last, "that would give potentially real insight into evolution-how the system that does all this came into existence."

With greater understanding will also come the ability to manipulate-to engineer proteins that precisely fit the functional sites of hormones, enzymes, neurotransmitters, and more. Painkillers and antidepressants will be more powerful and more safe, because they will be designed to do exactly and only what is wished, with no side effects. New drugs might bind and incapacitate undesirable proteins-one produced by p53, for example. Or, says Rose, they might block an HIV protease so that the AIDS virus can't mature.

Ham Smith judges that LINUS pictures can already be used to develop detailed atomic structures, such as would be needed to engineer custom proteins. "If you sat down with this and your X-ray diffraction patterns, you could start refining already," he says. Within a few years, he expects computer programs that are able to generate structures accurate to the atom.

LINUS being very young, its reach is still small. As of mid-April, the program was most accurate for units of 50 amino acids, which then needed to be assembled. Certain complexities were simplified out, and the algorithm only worked on proteins that live in water. It had been tested only on proteins made of a single strand of amino acids, and on globular ones.

Rose says LINUS is "not inherently limited," however. In the weeks before the journal article was to come out, he and Srinivasan were fine-tuning the program, and they expected to go on to larger molecules immediately. "We see no reason that it shouldn't work-though that's not the same as actually doing it." By the time you read this, perhaps it does work.

But even if LINUS only worked on globular proteins, now and forever, the algorithm would still be exciting, because globular proteins are central to metabolism and neurological/muscular function. As opposed to proteins that form structures (like bone and hair), for which the algorithm may never work, Rose calls globular proteins "the ones that do something." The group includes hormones, enzymes, neurotransmitters, receptor proteins, antibodies, and the transport proteins (like hemoglobin, which transports oxygen). Defects in these proteins can cause untold misery.

Rose cautions against blind optimism, however. "Knowing a genetic lesion doesn't mean you can fix it. In fact, it may be there are things for which you can know the problem and never do anything about it. So it's like a lock. Just because you understand the lock doesn't mean you can open it. But at least you might recognize a key if you saw one."

From the ability to predict protein structures will come ways to engineer them. We can expect wonderful new medicines, foods, and materials. On the darker side, someone may attempt eugenics.

Over the very long term, the implications of LINUS are too big to see, because a discovery of this dimension changes the way one sees the world. "Nietzche says that what we see depends on the perspective from which we look," says Rose. What will it mean to really get it that human beings are big bags of protein? "Our very idea of self is called into question," says Rose. From a new worldview, scientifically and socially, who can say what will arise.

No one can doubt, though, that a much deeper understanding of living things-of all living things - - will dramatically affect daily life and technology, for this discovery is not only a vehicle for understanding. It is a tool, an immensely powerful tool. It will have uses precisely as good, bad, ambiguous, and unexpected as the humans who will wield it.

Some genetic diseases may see a cure. We can expect wonderful new medicines, foods, and materials, improved bacteria that clean up industrial sludge, perhaps more nutritious crops for protein-poor parts of the world. Science fiction writers have been right before (about robots, for instance). Who knows, perhaps they were right about gels that rebuild wounded tissue.

On the darker side, someone may attempt eugenics. "After all," says Rose, "if you're going to use technology to cure someone of an inborn error of metabolism, that is eugenics. So why not apply it to make people taller, smarter, prettier? Suppose you thought you knew enough to do that? Suppose you thought you could improve people's memories, without losing any other capacity?"

Some difficult moral and legal choices lie ahead. If we in the United States have found ourselves troubled by the idea of withholding life-saving treatments from those who cannot pay a high price, we'll be much more troubled now, because so much more will be available. Suppose, for instance, that cystic fibrosis could be cured at birth. Would it be right to withhold treatment, with not only lives but lifetimes in the balance? If we give to everyone, in such cases, what will it cost, financially? If we do not, what will become of a society that has openly and deliberately declared that all human beings are not created equal?

To understand how LINUS works, it helps to know something about proteins. Those who already do may wish to skip down past the diagrams.

All proteins are made of amino acids, of which there are only 20 that occur naturally. In a protein these amino acids are strung end to end like a strand of beads, each attached to the next in a linear "backbone." Their order is dictated by DNA (and numbered sequentially by scientists). For example, look at the very beginning of a ribonuclease molecule, its backbone shown in the unfolded state:

Image coming soon. Thank you for your patience.

Green = carbon, blue = nitrogen, red = oxygen, and black = hydrogen. The "sidechains" poking out are chemical groups available for chemical interaction, the ISOs of the molecular world. It is these that make one protein differ from another. They are bonded to the backbone in the chemistry that makes Earth the home of carbon-based life forms. And, of course, the sidechains also interact with each other - - the folding reaction.

Each string of amino acids has a beginning, a direction, and an end. The beginning is always an amino group, -NH2+, from which point natural folding (or unfolding) proceeds toward the end. The end is always a carboxyl group, -COOH.

When the protein is hard at work, as opposed to being denatured (like scrambled eggs), its parts fold themselves in any of four basic motifs:

Image coming soon. Thank you for your patience.

and "coils." (Think of a coil of rope dropped loosely till it's needed.) By coils, researchers mean a disordered loop of protein that probably does have an order, once some other molecule comes along with a good chemical "fit." Coils never appear in diagrams.

Any or all of these motifs can be found in the protein composites made from them, like this one:

Image coming soon. Thank you for your patience.

LINUS is built around the four basic motifs, and around a discovery George Rose made in 1979 and published in the Journal of Molecular Biology - - that protein folding is both local and hierarchical.

Local meaning that each amino acid can "feel" the others nearby in sequence, being chemically/electrically attracted or repelled. The closer the attractor/repeller, the stronger the influence.

Hierarchical meaning that the structure develops from the smallest units, as a house may begin with bricks, and builds in an architectural hierarchy. For example, a small neighborhood of amino acids, jiggling around for comfort as they "feel" each other push and pull, might fold into two sheets joined by a loop. The two sheets might then form hydrogen bonds to lie side by side, as actually happens in silk. (That's why the material of kings is so sensuously flexible.) And so on, by iteration: as their neighboring parts subtly push and pull one another, a protein's sheets, helices, and turns organize themselves, starting at the smallest level and working up.

"George discovered [hierarchical folding]," says Lattman. "Not everyone believed it. And of those who did, not everyone agreed it was important." Adherents of the (large) opposing school of thought were combing sequences to find the equivalent of blueprints, big overarching plans with a Concept. What they have found is a lot of complication.

Rose's theory has the beauty of being simple, as nature tends to be at base. "It's so simple," declares Lattman, "that people are going to be hitting themselves on the head and saying, `Why didn't I think of that!'"

Rose decided to start with units of three residues, just three. ("Why three?" someone demanded in the lecture. "Because three can nucleate a helix.") As for finding out which three worked together locally, he let the protein itself declare that.

The protein speaks through the computer algorithm, LINUS - - an acronym for Local Independently Nucleated Units of Structure, a conglomerate of words that was carefully crafted to honor the late Linus Pauling. Rose explains, "Linus Pauling essentially founded this field."

While Rose understands computers, he is primarily a theorist, and LINUS didn't take off until Rajgopal Srinivasan joined him, almost two years ago. At the time, Rose's lab was full, but Srinivasan's professor at the University of Illinois, a famous organic chemist named John Katzenellenbogen, wrote such a strong recommendation that, says Rose, "we thought we'd better look at him anyway. We invited him to come give a talk, and this talk was so thoughtful, we were blown away and decided that we would do anything to have him join our lab."

When Rose came to Hopkins last fall, Srinivasan came with him. "He is the Glenn Gould of the computer keyboard," says Rose now. "He is an authentic collaborator - - it's not that I say do this and then come back. It's hard to articulate how it works. We talk a lot, think a lot, come up with a large number of ideas. Then Raj disappears and sort of makes things happen... He is a fully mature and absolutely brilliant scientist, who just happens to be a postdoc in the lab." As well as contributing ideas, Srinivasan wrote the program from start to finish.

LINUS reflects the gifts of both men. It is based in local, hierarchical folding, plus selected facts about amino acid behavior, embodied in a complex program that now takes days to simulate a small protein on a high-performance workstation. In the next version - - "or maybe the one after that" - - Rose expects a leaner, sleeker LINUS that will run in minutes on any Macintosh or PC.

As for what LINUS does, it begins at the beginning and works through the protein's sequence, systematically perturbing the amino acids in groups of three: amino acids 2-3-4 for instance. "We toss a four-sided coin," says Rose. "Helix, sheet, turn, or coil. Say it's a helix."

Then, using criteria from another part of the algorithm, LINUS evaluates whether making those three amino acids helical seems to improve their fit with the six amino acids on either side. (Technically, this is called the "Monte Carlo simulation.") For example, possible hydrophobic contact would be good. But creating a situation where two atoms were in the same place at the same time - - obviously, that's a no-go.

Whatever the outcome, LINUS records it and moves on to 3-4-5, to flip the coin and evaluate. Perhaps chance called this group a Turn, but no known turn includes those three residues - - another no-go. LINUS records, then goes on to 4-5-6, 5-6-7 and so on to the end, 5,000 times.

Finally, the program evaluates all 5,000 trials, looking for any local group that, "feeling" six amino acids in either direction, finds itself in the same conformation more than 70 percent of the time. Perhaps amino acids 23 through 28 want to be a loop.

Fine. Any such element gets frozen, held in whatever its preferred structure turned out to be, as LINUS begins again. Now it moves up the hierarchy to perturb the amino acid groups all over again. But this time each group can feel as far as 12 places away, and larger structures start to emerge.

LINUS does that 5,000 times, evaluates, freezes any emergent structure, then moves on to 18 places, 24 places, and so on. Simple as that. (Ha.)

In introducing Rose to the audience before the March 29 talk, Jeremy Berg, director of biophysics at the School of Medicine, had said, "When George came here, I said to him jokingly, `Okay, I give you a year to solve the folding problem.' He said fine.

"Then I ran into him in January in the stairway and I said, `How's that problem coming along?' He said, `Oh, we solved it.' I laughed and said, `No, I meant the folding problem.'

"He said, `So did I.'"

Of course, Rose and Srinivasan don't really know how nature folds proteins. "We're not trying to be biological, we're trying to create something," Rose says cheerfully. It's revealing, then, since their predictions are so accurate, that the approximations in the algorithm are often quite crude.

Rose got to those approximations by making a radically simplifying assumption: that stability of the molecule was irrelevant. Not unnecessary - - simply something he didn't need to think about, on grounds the protein would hold together well enough, as it does in nature, without including all the atomic detail. "That leaves a lot out," says Lattman.

It leaves out, for instance, the precise details of hydrogen bonding and hydrophobic interactions, the two strongest forces that hold proteins together. Side chains are also simplified (from highly complex to 1, 2, or 3 atoms). Most heretical of all, so are the molecule's energetics, typically measured out to several decimal points if it can be done. In LINUS, energies are crudely estimated in units of 1, 2, or 3.

Yet it works. Rose thinks that fact says a lot about the power of local interactions, "which are dominant in folding proteins. Proteins do have strong forces," he says, "but protein conformation is determined - - and probably overdetermined - - by a very large number of very weak forces. But a large number." He looks amused. "Like me. I'm bound here with contractual covalent bonds. But day to day, what's really important is shoulder-rubbing."

The power of weak bonds shows that the folding mechanism is robust, he contends. "The difference in free energy between the folded and unfolded forms is very small. Consequently people tend to think of these as delicately balanced systems. I say, any system that can be predicted by the crude approximations that we made must be robust.

"Biological robustness is based on a kind of stability that's different from the Pyramids," he says. "The Pyramids have lasted thousands of years - - but they're eroding now. Unless we take special measures, in a few thousand years they'll be gone. Whereas, more than 3 billion years ago, information on how to make bacteria was encoded in DNA. Today, all that information survives."

In biology, it seems, as in nations, a bottom-upwards order works best. Proteins have no czar dictating their order. In fact, George Rose thinks the system is driven by entropy.

But that's another story.

The Science of Collaboration

Send EMail to Johns Hopkins Magazine

Send Email to Elise Hancock

Return to table of contents.