A computer facility that could eventually handle
enough data to fill 1 billion diskettes has won
the Storage Challenge at SC08, the eighth annual
International Conference for High Performance
Computing, Networking, Storage and Analysis, held recently
in Austin, Texas.
Designed by a team led by computer scientist and
theoretical astrophysicist Alexander Szalay
of Johns Hopkins, the GrayWulf System (named in honor of
Szalay's friend and collaborator, the late
Jim Gray of Microsoft Research) combines inexpensive
hardware and software into a single innovative
platform that can analyze and process petabyte-scale data
sets. (A petabyte is equal to 1,000
terabytes, or 1 quadrillion bytes. Facebook users, for
instance, have stored 1 petabyte, or about 10
billion photos, on the social networking site.)
According to Szalay, GrayWulf will enable scientists
to quickly and efficiently search through
massive amounts of data to locate and identify patterns
that will lead to new discoveries.
"GrayWulf, built from simple, inexpensive components,
was consciously designed to sift
discoveries at a rate much higher — and a cost much
lower — than anyone ever thought possible," said
Szalay, Alumni Centennial Professor in the
Henry A. Rowland Department of Physics and Astronomy.
"It will help researchers do science directly in the
database, teasing out relationships within areas
such as astrophysics, hydrodynamic turbulence,
environmental sensor networking and even, potentially,
global climate change."
The winning team included several other scientists and
staff members from Johns Hopkins, as
well as experts from Microsoft; the University of Illinois,
Chicago; the University of Hawaii; and Dell.
In the competition, GrayWulf was able to sift through
information gathered as part of the
Sloan Digital Sky Survey to locate quasars (distant
astronomical objects characterized by changing
brightness) in 12 minutes, a search that took other
computing systems 13 days to handle.
"GrayWulf is significant because the archetypal
scalable design supports the new paradigm of
data-intensive computing," said Tony Hey, corporate vice
president for Microsoft External Research.
"Built on the pioneering database work of Microsoft
researcher Jim Gray, GrayWulf is a tool that will
drive scientific discovery and innovation by giving
scientists the power to efficiently process and
analyze massive amounts of data."
Szalay said that events such as the SC08 competition
do more than simply bestow bragging
rights on the winners; they also have an impact on how
future science and engineering research — both
of which are generating tremendous data sets — will
be done. Szalay said that the successful
development of tools for data-intensive science in one
field — astrophysics, for instance — can be
generalized and applied to other fields, and will result in
crossdisciplinary pollination.
According to team member Alainna Wonders, a JHU
information technology system
administrator, the prize also serves as recognition of the
enormous amount of work that the team has
done in designing GrayWulf.
"We've proven that databases are an effective way to
manage large amounts of data for
scientific research," Wonders said.
Funding for GrayWulf was provided by the Gordon and
Betty Moore Foundation, Microsoft
Research and the Panoramic Survey Telescope and Rapid
Response System, known as Pan-STARRS.