Campus News

UGA, Penn get $14.6 million to expand pathogen database

Jessica Kissinger
Jessica Kissinger

UGA, Penn get $14.6 million to expand pathogen database

Databases that allow scientists to rapidly analyze the genes, proteins and enzymes of pathogens have revolutionized the search for potential vaccine and drug targets. One such database, created by a team of scientists at UGA and the University of Pennsylvania, will expand thanks to a $14.6 million contract from the National Institute of Allergy and Infectious Diseases.

The database is known as EuPathDB (http://EuPathDb.org), short for Eukaryotic Pathogen Database Resources. It has its roots in a database created in 2000 for the genome of the Plasmodium parasite, which causes malaria, but has since expanded to include several other disease-causing organisms that disproportionately impact the developing world. Of the $14.6 million total, $4.4 million is subcontracted to UGA.

“We take several different kinds of molecular data and put it all in one place so that everything we know about an organism is in one useful resource,” said Jessica Kissinger, associate professor of genetics at UGA and a researcher in the university’s Center for Tropical and Emerging Diseases and Institute of Bioinformatics. “It’s kind of a one-stop shopping approach that can save researchers years by quickly giving them access to the data they need.”

Kissinger’s co-principal investigator at UGA is Eileen Kramer, professor and head of the department of computer science.

Advances in genome sequencing have increased dramatically the amount of data available on pathogens, Kissinger said.

The Plasmodium parasite had its genome sequence completed in 2002 after six years of work at a cost of $35 million. Now, scientists can sequence strains of the parasite in just two weeks for $3,000.

The ability to sequence genomes rapidly has created a sea of data for researchers to navigate.

“For example, 300 genome sequences have been generated from clinical isolates of the most deadly form of Plasmodium and another 200 are in the pipeline,” Kissinger said. “This organism is but one of 27 that we manage.”

Raw data for a genome can generate five terabytes—that’s 5,000 gigabytes—of data.
“Some of these data files are so big that researchers can’t open them on any of the computers that they own,” Kissinger said.

The database and related sites managed by the team are used by scientists in more than 100 countries and have received more than 42,000 unique visitors in the past six months alone.