Johns Hopkins Magazine -- November 1997
Johns Hopkins Magazine




Driven by their need to transmit tremendous chunks of data, Hopkins researchers are helping to build and test-drive the next generation of the Internet.

S C I E N C E    &    T E C H N O L O G Y

Taming the Terabyte
By Melissa Hendricks
Illustration by Digital Art

Within a year, astronomers on a remote hilltop in southern New Mexico will begin creating the largest archive of the universe. Using a new telescope equipped with sensitive light-sensor chips, they will survey half the northern sky and record the three-dimensional position of the brightest 1 million galaxies. In all, they will amass 40 trillion bytes, or 40 terabytes, of raw data. That is enough to fill 26 million floppy disks, and four times the collection of the Library of Congress. It will dwarf the Human Genome Project, the endeavor to map and sequence the human genes.

The Sloan Digital Sky Survey, as the project is called, will help astronomers understand how and where galaxies formed, and whether there is a structure to the universe, says Alexander Szalay, Hopkins professor of physics and astronomy, and the survey's archive director. But it also presents a new problem: how to transmit enormous chunks of data among the teams of collaborating astronomers, who are scattered throughout the United States and Japan.

Many scientists use the Internet for such tasks. But with e-mail to Grandma, jello recipes, and manifestos on Elvis sharing the information superhighway, the road is becoming gnarled in traffic. According to some experts, 5 to 10 million people are becoming users of the network each year. And researchers such as Szalay are generating larger and larger sets of data. A typical chunk of Sloan data will contain 10 gigabytes (100 billion bytes), says Szalay. Try sending 100 gigabytes across the country through the Internet. Go get a cup of coffee while you wait-- better yet, put in a new kitchen floor. It will take a few days. "The Internet's own success is bringing it down," says Szalay. "It's become so saturated that it is very hard to use." If he and his collaborators had to rely on the Internet to complete the Sloan project, he says, "it would be a nightmare."

They are not alone. Many other scientists--from particle physicists who are defining the components of the atom, to earth scientists modeling the interior of our planet--are poised to enter the era of the tera. It is named appropriately-- teras is Greek for monster, and these scientists are concerned that the Internet can no longer accommodate such a beast.

What's a Tera?

Take it from the top. Bits and bytes are the currency of computing. Computers use a binary system--zeroes and one--to store and transmit information. Each character on the keyboard is represented by a unique combination of zeroes and ones, or "bits" of information.

A "byte" is equivalent to eight bits, which is also the number of bits required to represent one character of type. From there, it's simple:

1 kilobyte equals
103 bytes

1 megabyte equals
106 bytes

1 gigabyte equals
109 bytes

1 terabyte equals
1012 bytes

FORTUNATELY, A NATIONAL NETWORK called vBNS promises to provide a new, wide-open road, at least for researchers and educators. vBNS, or very high speed Backbone Network Service, connects a select group of universities, national laboratories, and supercomputers through a super-fast fiberoptic network above the fray of the pedestrian Net. Still being expanded, it will transmit information at 622 megabits per second--more than 10 times faster than the best speed of the Internet. A terabyte, which would take two years to slog through today's Internet, will zip through vBNS in just a day, says Szalay.

Hopkins recently received a $350,000 two-year grant from the National Science Foundation to connect to vBNS, a figure the university is matching with its own funds. Through similar grants, 64 universities have to date signed on to the new network, which is supported by NSF and built by MCI. Several researchers at Homewood will be connected within the next few months. "This is literally at the bleeding edge," says Dave Binko, Homewood's director of Academic Computing.

Not only will vBNS provide a faster means for transmitting huge databases, the new network will also enable users to do long-distance work in real time. An astronomer at the Homewood campus will be able to manipulate a telescope in New Mexico. A surgeon at Hopkins will be able to consult with a colleague performing a procedure on the other side of the country. Students scattered around the globe will "attend" a lecture at the School of Public Health as it takes place.

In addition, vBNS is a laboratory for developing and exploring new network technologies and management strategies. Eventually, the tools created through vBNS will be used to create an improved version of the Internet, says Binko.

Of course, since vBNS is an experiment platform, anything could happen. For example, the network's handlers might take down the system for a few hours to upgrade a router or piece of software-- something that happens on today's Internet, but not all that frequently. Given the alternative, however, scientists whose research would benefit from a speedier, more reliable network are willing to take such risks.

MANY OF HOPKINS'S INTERNET CHALLENGES rest on the shoulders of Academic Computing's Binko. Like many veteran computer experts, Binko learned his trade through osmosis. He began working at Hopkins 21 years ago, as a research associate doing computer programming at the School of Public Health. "I spent my first five years complaining about computer problems," Binko jokes during an interview in his office in Krieger Hall. So administrators suggested that perhaps Binko could solve some of the problems himself, and appointed him director of the university computer center, now Academic Computing. Since then, no technology under his purview has remained the same for very long.

A coffee mug labeled "Carpe diem" on Binko's desk (next to a collection of Pez dispensers) says it all. "Seize the day"--or get left behind, as communications technology changes at an exponential pace.

It was scientists (as well as the military), who created the Internet in the first place, explains Binko. (The Defense Department's network, known as ARPANET, was formed in 1969.) In 1986, the National Science Foundation began NSFNET, mainly for universities. Hopkins joined in 1987. For several years, says Binko, "the Internet was almost the exclusive province of researchers."

Then America Online began to give customers Internet access, and other providers followed suit. The World Wide Web was created, and wham! Suddenly, says Binko, "the whole thing grew like Topsy." Today, anyone can and does use the Internet for anything.

("I'm convinced that 50 percent of the traffic on the Internet is baby pictures," notes Hopkins astronomer Alan Uomoto, whose wife recently gave birth to the couple's first child.)

When they saw that the Internet was becoming swamped, a group of universities formed a consortium called Internet 2, dedicated to planning and building a more efficient and more reliable national network. Hopkins joined the 109-member group last January. Internet 2 examines networking problems such as how to create better Internet Protocols (IPs), the rules of the road, so to speak, that are used to steer information through the network. Members also include long-distance companies such as AT&T and MCI, which will lay down any new fiber optic cable connections to complete the new network.

Yet another project is the Next Generation Internet Initiative (NGII), which was launched last year by President Clinton, who requested $100 million for developing new network technology. vBNS is seen as a testbed for the technologies that are being proposed by Internet 2 and NGII.

Collaborating to Map the Universe
IN SUPPORTING vBNS, the NSF is, in a sense, re-creating the way the Internet used to be, before the whole world logged on. At one time, a scientist might have sent a two-megabyte file through the network, and safely assumed it would arrive at its destination within a few minutes, says Binko. Now, the researcher might find that the same transmission takes two hours.

Computer scientists describe the network of their dreams, as you might expect them to: A low-latency, high-bandwidth, globally distributed information infrastructure. Translated, it means a network that transmits bits and bytes with little delay ("low latency"), through a big pipe ("high bandwidth") to locations throughout the world ("globally distributed").

If you feel lost in a murky sea of jargon, you're not alone. By its nature and underlying philosophy, the Internet is a rather abstract concept. The designers of the Internet intended it to be decentralized and egalitarian. There is no Internet building, no Internet czar holding the reins, and everyone should have access. Then where, exactly, is cyberspace?

About the closest you can come to a physical manifestation of the Internet at Homewood is the campus's router, which lives in the basement of Garland Hall, a short walk across the quad from Binko's office.

The Academic Computing center contains row upon row of blue cabinets that look like gym lockers, each containing several pieces of Homewood's computing system. Computers, fans, and air conditioning combine to create a hypnotic white noise in the underground room. An old mainframe computer stands off to one side, a relic of those "ancient" years, the late 1980s. It still functions, and is used by the library.

Don't expect bells and whistles, Binko warns, as he opens the blue cabinet housing the router. Indeed, the inglorious metal box with some cables attached is a far cry from the Starship Enterprise. But it has a big job, Binko explains.

Making Star Gazing Remotely Possible
Routers (and their cousins, switches) are the Internet's docking points. They take in electronic transmissions from many different sources and route them to their destinations, somewhat like the way a post office routes mail.

Suppose from my computer at Johns Hopkins Magazine, at Homewood, I send an e-mail message to my mother, in northern California: "Dear Mom, I am sending this message at 4:50 p.m. on August 20. Please let me know when you receive it. Love, Melissa."

Here's what happens. Electrons carrying the message stream out my computer through a rubber-covered copper wire, which winds through my office, finally connecting to a "hub" in a closet. The hub transfers the message to fiber optics, which carry it to Homewood's backbone network, a pathway that snakes around the campus and forms a large figure eight.

The message is snagged off the network at Garland Hall, where the router reads its address and sends it out to the larger network.

That's just the beginning.

The message then hops to a router in Baltimore, which transfers it to one in Washington, which transfers it to another in Washington, and on and on through 12 more before it arrives at my mom's computer. The trip took only milliseconds. My mom e-mails back, "Received your message at 4:51 EST.

I can't complain about that speed. But short e-mail messages are not what is giving grief to terabyte-crunching scientists like Szalay.

The modus operandi of the Internet was not designed for large numbers of transmissions sent at unpredictable intervals, explains Binko. It's not like the phone system.

The next Internet, says Academic Computing's David Binko, will be like Federal Express: "When it absolutely, positively has to be there, the network will get it there."
"A phone network is very predictable," says Binko. "You know a phone call is going to require 64 kilobits/sec constantly. You pick up the phone, dial, and hear 'beep, beep.' That's the switching station saying, 'Give me 64 kilobits and give it to me now.'" If 64 kilobits are not available, you get a busy signal.

"Well, the Internet doesn't work that way," says Binko. "There's a wide variety of stuff that gets thrown on the Internet"-- e-mail, voice files, video files--each requiring a different amount of bandwidth. "We have no way of saying, 'Give me 5 kilobits per second and guarantee it to me.'" Instead, as at the post office, "You dump your packet in a pile, and it gets delivered"--eventually.

Making this even more complicated is the diversity of connections to the Internet. For example, Homewood's backbone network carries information around campus at 100 megabits per second. But Homewood's leased line to the Internet transmits at only 4 mebagits per second. So it's hurry up and wait. Transmissions race to Garland Hall at 100 megabits per second, only to wait to travel out to the greater Internet on a relatively slow pathway. "We create a funnel," says Binko.

If a packet of information--say, my e-mail message--cannot get through, it gets stored in a memory system built into the router called a "buffer," a sort of electronic limbo. The buffer can get so crammed full that my e-mail message "falls on the floor," which means that it gets erased. Fortunately, if this happens, my computer simply resends the e-mail, but such delays are fatal for video and audio transmissions. They can result in a movie that arrives with a flickering image, or a crackly sounding audio interrupted by gaps of silence.

So network experts are looking toward several innovations to speed and smooth the flow of network traffic.

The simplest way to boost the Internet is simply to build a bigger pipe, that is, to provide connections with greater bandwidth, as has been done with vBNS. Such large pipes are available, though not necessarily affordable.

But larger highways are not enough, and could even add to Internet congestion, says Mark Luker, director of the NSFNet program, which coordinates vBNS. "You can make highways faster and faster by adding more and more lanes," says Luker, "but in the long-term, you'll have suburbs," which again create congestion.

So, networking experts are also looking for new traffic management strategies, the equivalent, says Luker, "of adding light rail and airplane and HOV lanes and interchanges."

vBNS and Internet planners aspire to an ideal known as "Quality of Service." No matter what form of information is being pumped through the network--data, voice, video--QoS means that the package is guaranteed to get to its destination when it needs to be there. Rather than the present Internet's first come/first served system, says Binko, the next Internet will be like Federal Express: "When it absolutely, positively has to be there," the network will get it there.

One potential way to achieve QoS is through a bandwidth reservation system, something like the reservation protocol used by the telephone system. "So if a physicist starts an application here and transfers to a cyclotron at Berkeley, there will be a mechanical guarantee of, say, 100 megabits per second for two hours," explains Binko. "And if somebody else needs to get on, [the system would] say, 'no can do'--the equivalent of a busy signal." Achieving QoS may also entail protocols that give priority to particular transmissions such as video or audio files, says Luker. That would allow a movie to be transmitted as one rapid, continuous stream, rather than in a slow, interrupted dribble, which results in a jittery final image.

Still to be worked out, notes Szalay, is how these techniques will be scaled up so that they work not just for vBNS users, but throughout the Internet.

In terms of hardware, engineers are also building better (read: faster) routers, says Joseph Pistritto (BSEE '79), a Silicon Valley consultant who developed protocols for PointCast and Oracle. "It used to be that a router was mostly software. Now, more and more it is hardware." In other words, more silicon rather than programming. A popular saying in Silicon Valley, says Pistritto, is that every 18 months, silicon gets faster.

Finally, in addition to the technological fixes, say the experts, the Internet sorely needs new management strategies. The Internet has grown too large and complex for decentralization and anarchy. Exactly what those management strategies should be, however, is still "nebulous," says Binko. "The Internet is an incredible tool in a way we don't fully understand."

Before we do fully understand it, the Internet will certainly increasingly permeate our lives. Cellular phones, cable-modems, and satellites will be connected to the Internet, predicts Pistritto. "This kind of thing will become pervasive. Large satellite networks will provide high-speed Internet connectivity anywhere you can see the sky."

In five to 10 years, says Pistritto, "Internet connectivity won't be something we think about, in the way that electricity is not something we think about. Your thermostat, for example, might be controlled through the Internet. So you'll be able to turn it up from your desk at work, or the electric company could turn it down and save you money." (A San Diego power company has already started.) Likewise, TV, telephones, and "the vast majority of traffic that moves" will be on the Net. "Everything is going to be connected to everything else."

Multicasting to Far-Flung Locales
HOPKINS'S CONNECTION TO vBNS will be activated in early 1998, according to Binko. Szalay and computer scientists Yair Amir and Michael Goodrich will be the first guinea pigs. Others will follow. "In two years, we'll have electrons flowing consistently," Binko predicts.

The Medical Institutions and Peabody Conservatory will also soon log on. Eventually, more and more people at Hopkins will have access, although it's not clear whether access will be universal. "It would be nice if everyone could send e-mail over vBNS, but we'll struggle with the issue of who gets what, who gets to drive the Porsche and who gets the Volkswagon," says Binko. "For the time being, vBNS will be an exclusive club."

The system will not be cheap. "It may cost $1 million to complete the infrastructure," says Binko. "But when our business is research and collaborative research, it pales by comparison."

Academics' need for a system like vBNS will only grow, says Binko. Already, Hopkins particle physicist Dave Gerdes and collaborators around the world are engaged in a project that will involve transmitting data on the order of a petabyte--1,000 times a terabyte.

Melissa Hendricks is the magazine's senior science writer.