Microbial ecosystems – the relationships between microorganisms and their environments including within the human body – play an important role in human health. Through diagnostic testing and genetic analysis, researchers can track how disruptions of this ecosystem can cause problems ranging from asthma and allergies to obesity and diabetes. The processes currently used to analyze this data are labor intensive and often inefficient. The sheer volume of data can be difficult to manage. Some bacteria are fragile and die outside of the microbiome reducing the information researchers can obtain.
Now, a new project by University of Georgia researchers funded by the National Science Foundation could lead to new computational tools to further understand and identify the complicated makeup of the microbiome.
“Metagenomic data – material recovered directly from environmental samples – and metabolomic data – the unique chemical fingerprints that specific cellular processes leave behind – are now typically analyzed separately, and each provides a different type of information on metabolites,” said Wenxuan Zhong, professor of statistics and director of the UGA Big Data Analytics Lab. “We want to develop computational methods to combine the data to study the link between microbial species and phenotype. By linking this data, we can study how the microbial system affects human health.”
These new computational methods will utilize MetaGen, a tool developed by Zhong’s research group that can simultaneously identify microbial species and estimate their abundance in multiple samples.
Zhong and her team will establish a set of novel statistical framework and computational strategies to effectively integrate metagenomic and metabolomic data. To explain the relationship between microbial ecosystems and some human diseases, they will develop a new statistical test to identify the disease-related microbial species strains related to certain diseases. The innovative joint-analytical method will detect interactions within microbial species and the microbiome-human interaction.
“There are many potential fields where our new method can help improve both the biological and statistical analysis process,” said Zhong. “For example, we can help design new biomarkers for accurate disease diagnostics, provide potential opportunities for probiotic supplements, and develop medical intervention strategies.”
The project represents a new direction in the field of big data with promising possibilities as well as new challenges.
“It is like solving a jigsaw puzzle. The difficulty is not solving one puzzle, but solving thousands of puzzles all mixed together at the same time. We have to pull the pieces apart and reassemble them in the right groups,” said Zhong. “We must establish a solid foundation in order to benefit other researchers in the same field.”
“The NSF support opens many opportunities for us. Specifically, for this project, our research process could lead to fundamental advances in statistics and machine learning research,” said Zhong. “We plan to not only combine different data sciences tools, but also build new and innovative data science research processes.”
Zhong’s research team includes statistics doctoral candidates Ye Wang and Mengrui Zhang. The work is supported by grant from the Division of Mathematical Science of the National Science Foundation.