The Google Flu Trends algorithm, as it is known, malfunctioned. For example, he continually overestimated doctor visits, subsequent evaluations were found, due to data limitations and the influence of external factors such as media attention, which may drive searches that are unrelated to an actual illness.
Since then, researchers have made multiple adjustments to this approach, combining Google searches with other types of data. Teams at Carnegie-Mellon University, University College London, and the University of Texas, among others, have models that incorporate some real-time data analysis.
“We know that no single data stream is useful in isolation,” said Madhav Marathe, a computer scientist at the University of Virginia. “The contribution of this new document is that they have a good and wide variety of transmissions.”
In the new document, the team analyzed real-time data from four sources, in addition to Google: Covid-related Twitter posts, geotagged by location; doctor searches on a doctor platform called UpToDate; anonymous smartphone mobility data; and readings from the Kinsa smart thermometer, which is loaded into an app. He integrated those data streams with a sophisticated prediction model developed at Northeastern University, based on how people move and interact in communities.
The team tested the predictive value of trends in data flow by looking at how each correlated with case counts and deaths during March and April, in each state.
In New York, for example, a strong uptrend in Covid-related Twitter posts started more than a week before case counts exploded in mid-March; Relevant Google searches and Kinsa’s measures skyrocketed several days earlier.
The team combined all their data sources, in effect weighing each according to the strength with which it was correlated with an increase in cases. The researchers found that this “harmonized” algorithm predicted outbreaks in 21 days, on average.