The University of Georgia has signed a five-year $3 million subcontract to develop a database that will contain comprehensive information about some pathogens on a biodefense priority list established by the National Institute of Allergy and Infectious Disease.
The subcontract teams UGA with the University of Pennsylvania to develop a “virtual database” that serves as a single access point to genomic and related information about parasites in the phylum Apicomplexa, which includes organisms that cause malaria and toxoplasmosis.
NIAID, part of the National Institutes of Health, awarded a total of eight contracts in 2004 to establish national Bioinformatics Resource Centers, including the Penn-UGA award.
Jessica Kissinger, assistant professor of genetics and a member of UGA’s Center for Tropical and Emerging Global Diseases, is the principal investigator for UGA; co-principal investigators are Eileen Kraemer, associate professor of computer science, and John A. Miller, professor of computer science.
NIH has funded many genome-sequence projects over the past decade, including more than 50 organisms that either are considered to be biothreats or are related to emerging or re-emerging infectious diseases. Once a genome is sequenced, a database project must be developed to provide access to the data and provide tools to analyze it.
“You have to be able to read the sequenced genome, use it, learn it and study it,” Kissinger says. “Few of the genome projects had a database project built into the original sequence proposal.”
Tools already have been developed to facilitate database construction for single organisms. However, as more genomes are sequenced, additional information can be gathered by comparing one genome to another.
Currently, existing apicomplexan databases do not provide access to information about multiple organisms, making comparisons difficult. Simultaneous access to information about multiple pathogens may accelerate development of new vaccines, diagnostics and therapeutics.
Also, scientists want immediate access to as much information as possible about a pathogen in the event of a sudden disease outbreak, Kissinger says.
The UGA-Penn team plans to develop a database that links existing databases for Plasmodium species, the causative agent of malaria; Toxoplasma gondii, a widespread parasite that is dangerous for pregnant women and immuno-suppressed individuals; and Cryptosporidum parvum, a common intestinal parasite that is also dangerous for the immuno-suppressed.
“We could make the database for these organisms by collecting all of the data together in one location,” Kissinger says. “I call that the vacuum-cleaner approach. But there’s too much data to suck it all up, so new approaches are needed.”
Instead, the UGA team will use a relatively new technology called “Web services” to link the multiple databases. Web services technology allows one database to talk to another database.
“In fact there will be multiple separate databases,” Kraemer says. “But with this Web service layer that will go on top of them, users will have the illusion that there is one database. They’ll be able to ask a single question that applies to multiple databases and get a response.”
The database for Plasmodium is already well advanced; for the other parasites, sequence data is just becoming available.
“Penn is largely producing the infrastructure and tools to store and analyze all the different data types,” Kraemer says. “We are working to produce the Web services infrastructure that will sit on top of that and allow the multiple separate databases to be linked.”
Each UGA collaborator contributes special skills to the project: Miller has the Web service expertise, Kraemer has user interface and visualization background, and Kissinger is expert in the use of molecular and computational tools to study parasite genomes.
“There’s a tremendous amount of research involved in how to integrate the data-how do you link this to that. It’s actually very hard,” Kissinger says. “But, that said, once you figure how to solve the problem, you have to put it on the Web, make it public, make it work and keep it all going. That’s a huge effort.”
The other NIAID awardees are developing databases that provide access to information about pathogens and vectors such as bacteria that cause anthrax, plague, and water and food-borne diseases; viruses that cause rabies, Ebola and influenza; and vectors such as mosquitoes.
“We hope some of the technologies that we develop to link our databases will be useful for linking to the other databases created by the eight Bioinformatics Resource Centers,” Kissinger says.