A data augmentation strategy for deep neural networks with application to epidemic modelling
Muhammad Awais, Abu Safyan Ali, Giacomo Dimarco, Federica Ferrarese, Lorenzo Pareschi
TL;DR
The paper tackles the need for scalable and accurate epidemic forecasting by marrying mechanistic SIR-type dynamics with data-driven surrogates. It augments real COVID-19 data with synthetic trajectories from a social-SIR model with saturated incidence $H(t,I)$ to train FFNs and NARs, offering a practical alternative to PINNs. The key contributions are a two-phase parameter estimation regime for pre- and post-lockdown dynamics and a data-augmentation framework that improves temporal forecasting via NARs, demonstrated on Italy and Spain during lockdown. This hybrid approach enables fast, data-informed predictions while retaining interpretability from the mechanistic backbone, with potential to inform public health decisions under data scarcity or uncertainty.
Abstract
In this work, we integrate the predictive capabilities of compartmental disease dynamics models with machine learning ability to analyze complex, high-dimensional data and uncover patterns that conventional models may overlook. Specifically, we present a proof of concept demonstrating the application of data-driven methods and deep neural networks to a recently introduced Susceptible-Infected-Recovered type model with social features, including a saturated incidence rate, to improve epidemic prediction and forecasting. Our results show that a robust data augmentation strategy trough suitable data-driven models can improve the reliability of Feed-Forward Neural Networks and Nonlinear Autoregressive Networks, providing a complementary strategy to Physics-Informed Neural Networks, particularly in settings where data augmentation from mechanistic models can enhance learning. This approach enhances the ability to handle nonlinear dynamics and offers scalable, data-driven solutions for epidemic forecasting, prioritizing predictive accuracy over the constraints of physics-based models. Numerical simulations of the lockdown and post-lockdown phase of the COVID-19 epidemic in Italy and Spain validate our methodology.
