Systematic analysis of the effectiveness of adding human mobility data to covid-19 case prediction linear models
Saad Mohammad Abrar, Naman Awasthi, Daniel Smolyak, Vanessa Frias-Martinez
TL;DR
This paper addresses whether incorporating human mobility data enhances covid-19 case regression forecasts. It adopts a systematic, cross-dataset, lookahead evaluation using elastic net regression, comparing baseline models (past cases only) to mobility-augmented models across five prediction horizons and ten mobility datasets, measured by Spearman correlation between predicted and actual cases. The main finding is that mobility provides meaningful gains only for about two months near the onset of the testing period, with maximum improvements around $ci \leq 0.3$ and strongest effects for Apple and Google data; beyond this window, benefits are small or negative, highlighting cost-effectiveness considerations given data access. These results suggest limited long-term utility of mobility data for covid-19 forecasting and point to the need for alternative modeling approaches and careful data-use decision making.
Abstract
Human mobility data has been extensively used in covid-19 case prediction models. Nevertheless, related work has questioned whether mobility data really helps that much. We present a systematic analysis across mobility datasets and prediction lookaheads and reveal that adding mobility data to predictive models improves model performance only for about two months at the onset of the testing period, and that performance improvements -- measured as predicted vs. actual correlation improvement over non-mobility baselines -- are at most 0.3.
