Not All Accuracy Is Equal: Prioritizing Independence in Infectious Disease Forecasting
Carson Dudley, Marisa Eisenberg
TL;DR
A toy example illustrating the theoretical cost of correlated errors is presented, correlations among COVID-19 forecasting models are analyzed, and improvements to model fitting and ensemble construction that foster genuine diversity are proposed.
Abstract
Ensemble forecasts have become a cornerstone of large-scale disease response, underpinning decision making at agencies such as the US Centers for Disease Control and Prevention (CDC). Their growing use reflects the goal of combining multiple models to improve accuracy and stability versus relying on any single model. However, while ensembles regularly demonstrate stability against individual model failures, improved accuracy is not guaranteed. During the COVID-19 pandemic, the CDC's multi-model ensemble outperformed the best single model by only 1\%, and CDC flu ensembles have often ranked below individual models. Prior work has established that ensemble performance depends critically on diversity: when models make independent errors, combining them yields substantial gains. In practice, however, this diversity is often lacking. Here, we propose that this is due in part to how models are developed and selected: both modelers and ensemble builders optimize for stand-alone accuracy rather than ensemble contribution, and most epidemic forecasts are built from a small set of approaches trained on the same surveillance data. The result is highly correlated errors, limiting the benefit of ensembling. This suggests that in developing models and ensembles, we should prioritize models that contribute complementary information rather than replicating existing approaches. We present a toy example illustrating the theoretical cost of correlated errors, analyze correlations among COVID-19 forecasting models, and propose improvements to model fitting and ensemble construction that foster genuine diversity. Ensembles built with this principle in mind produce forecasts that are more robust and more valuable for epidemic preparedness and response.
