Scientists at the University of Georgia have used machine learning to predict the reemergence of existing infectious diseases. The algorithm monitors public health data to detect statistical patterns associated with impending outbreaks. The study, by Tobias Brett and Pejman Rohani, was recently published in PLoS Biology.
While the method cannot predict the onset of a new disease like COVID-19, it nevertheless has implications for public health during the pandemic.
“The World Health Organization is talking about declining vaccine coverage for diseases that we already know about and for which we have vaccines, things like measles, things like pertussis, things like mumps,” said Rohani, Regents’ Professor and UGA Athletic Association Professor of Ecology and Infectious Diseases. “As a result of COVID, this algorithm might be really useful for identifying those populations that might be about to undergo a resurgence in these vaccine-preventable infections.”
The method is based on the theory that certain telltale patterns appear in surveillance reports as underlying conditions become favorable for an outbreak. Those changing conditions can include things like waning vaccine effectiveness or decreasing vaccination rates or environmental factors like changing climate. Regardless of the underlying cause of the change, the case reports exhibit the same statistical patterns.
Brett and Rohani developed an algorithm that monitors case reports over time to look for those patterns and calculate the level of risk that a disease will reemerge.
Because the algorithm needed to be applicable to many diseases, Brett and Rohani trained it to recognize the patterns characteristic of an impending outbreak using 10,000 sets of simulated case reports covering a period of 10 years. The simulated data included a wide range of parameter combinations and different mechanisms of resurgence.
So the algorithm learns what are the statistical features, and combinations of statistical features, that successfully predict if a population is about to exhibit an outbreak or not.” — Pejman Rohani
Half of these time series data sets were designed to lead up to an emergent disease outbreak, and half were not.
“We tell the algorithm which 5,000 time series are emerging [diseases] and which 5,000 are not,” said Brett, a postdoctoral associate in the Odum School of Ecology. “So the algorithm learns what are the statistical features, and combinations of statistical features, that successfully predict if a population is about to exhibit an outbreak or not.”
Once the algorithm learned to identify the general patterns indicative of disease emergence, Brett and Rohani tested it on time series data for cases leading up to four historical disease outbreaks.
In 2004-2005, an outbreak of mumps, a viral disease, occurred in England after a roughly 15-year period of very low transmission following the institution of routine infant vaccination. Analyzing case reports from Public Health England from 1990 through 2005, the algorithm identified the pattern signaling an impending outbreak four years before it began.
Near perfect identification
Pertussis, a bacterial disease that was drastically reduced by vaccination programs, has recently seen an uneven resurgence in the U.S. Outbreaks have occurred in some, but not all, states at different times beginning in the late 1970s. In this case, Brett and Rohani wanted to know if the algorithm could identify in advance which states experienced outbreaks. Applied to data from state public health agencies for the period from 1980 to 2000, the algorithm correctly identified those states nearly 100% of the time.
Mumps and pertussis are both transmitted directly, but many infectious diseases of public health concern are spread by vectors such as mosquitoes, ticks or fleas. To determine whether the algorithm would work for those diseases as well, Brett and Rohani tested it on data from a brief 2017 outbreak of bubonic plague in Madagascar and a series of outbreaks of dengue fever in Puerto Rico that occurred between 1995 and 2009. In both cases, they found that the algorithm was able to identify the impending outbreaks before they happened.
“I think both Toby and I were astonished by how well the algorithm worked in these different systems with different transmission modes and resurgence that acts over very different timescales, ranging from weeks to many years” said Rohani. “That increases our confidence that this approach is identifying something that’s very generic in these systems and it’s not system-specific or detail-specific.”
Health officials still make the call
Brett and Rohani emphasized that while the algorithm could calculate the risk of a disease reemerging, deciding on the threshold of risk that would trigger an alert and outbreak prevention measures would need to be done by public health officials, taking into account the larger societal and economic context.
“We view our algorithm as a potential tool in the public health toolkit. The kinds of questions of when to sound the alarm and when not to sound the alarm cannot be made by scientists alone,” said Brett. “They have to be made with an understanding of the broader costs associated with either mistakenly sounding the alarm or failing to prepare for an outbreak. Presently, this is something public health authorities are best positioned to do.”