Accurately Estimating Unreported Infections using Information Theory
Jiaming Cui, Bijaya Adhikari, Arash Haddadan, A S M Ahsan-Ul Haque, Jilles Vreeken, Anil Vullikanti, B. Aditya Prakash
TL;DR
This work tackles the challenge of estimating unreported infections in epidemics by introducing MdlInfer, an information-theoretic approach that operates on top of traditional ODE-based epidemiological models. Framed through Minimum Description Length, it seeks the model $\text{Model}=(D,\Theta',\hat{\Theta})$ that minimizes the total description length $L(D,\Theta',\hat{\Theta})+L(D_{\mathrm{reported}}|D,\Theta',\hat{\Theta})$, thereby jointly estimating the total infections $D$ and a candidate reported rate $\alpha_{\mathrm{reported}}'$. Through two-step optimization (first estimating $\alpha_{\mathrm{reported}}^*$ and then solving for $D^*$), MdlInfer achieves total infection estimates closer to serological benchmarks and improves forecasting of reported infections and symptomatic-rate trends across SAPHIRE and SEIR+HD models. The method also enables counterfactual non-pharmaceutical interventions and emphasizes that NPIs targeting asymptomatic/presymptomatic transmission are essential for effective epidemic control. Overall, MdlInfer provides a principled, generalizable framework for enhanced epidemic modeling with potential broad applicability beyond COVID-19.
Abstract
One of the most significant challenges in combating against the spread of infectious diseases was the difficulty in estimating the true magnitude of infections. Unreported infections could drive up disease spread, making it very hard to accurately estimate the infectivity of the pathogen, therewith hampering our ability to react effectively. Despite the use of surveillance-based methods such as serological studies, identifying the true magnitude is still challenging. This paper proposes an information theoretic approach for accurately estimating the number of total infections. Our approach is built on top of Ordinary Differential Equations (ODE) based models, which are commonly used in epidemiology and for estimating such infections. We show how we can help such models to better compute the number of total infections and identify the parametrization by which we need the fewest bits to describe the observed dynamics of reported infections. Our experiments on COVID-19 spread show that our approach leads to not only substantially better estimates of the number of total infections but also better forecasts of infections than standard model calibration based methods. We additionally show how our learned parametrization helps in modeling more accurate what-if scenarios with non-pharmaceutical interventions. Our approach provides a general method for improving epidemic modeling which is applicable broadly.
