Table of Contents
Fetching ...

Augmented data and neural networks for robust epidemic forecasting: application to COVID-19 in Italy

Giacomo Dimarco, Federica Ferrarese, Lorenzo Pareschi

TL;DR

The paper tackles epidemic forecasting under data scarcity by augmenting training data with synthetic trajectories from a social SIAR compartmental model incorporating age structure and parameter uncertainty. It evaluates two neural network paradigms—Physics-Informed Neural Networks and Nonlinear Autoregressive networks—to forecast COVID-19 dynamics in Lombardy, showing that NAR excels in short-term quantitative predictions while PINNs provide valuable long-term qualitative insights. The key contribution is a data augmentation framework that improves predictive accuracy and robustness to uncertainty, demonstrated across real and synthetic data and two network architectures. This approach offers scalable, uncertainty-aware tools for rapid and reliable public health decision support during evolving outbreaks.

Abstract

In this work, we propose a data augmentation strategy aimed at improving the training phase of neural networks and, consequently, the accuracy of their predictions. Our approach relies on generating synthetic data through a suitable compartmental model combined with the incorporation of uncertainty. The available data are then used to calibrate the model, which is further integrated with deep learning techniques to produce additional synthetic data for training. The results show that neural networks trained on these augmented datasets exhibit significantly improved predictive performance. We focus in particular on two different neural network architectures: Physics-Informed Neural Networks (PINNs) and Nonlinear Autoregressive (NAR) models. The NAR approach proves especially effective for short-term forecasting, providing accurate quantitative estimates by directly learning the dynamics from data and avoiding the additional computational cost of embedding physical constraints into the training. In contrast, PINNs yield less accurate quantitative predictions but capture the qualitative long-term behavior of the system, making them more suitable for exploring broader dynamical trends. Numerical simulations of the second phase of the COVID-19 pandemic in the Lombardy region (Italy) validate the effectiveness of the proposed approach.

Augmented data and neural networks for robust epidemic forecasting: application to COVID-19 in Italy

TL;DR

The paper tackles epidemic forecasting under data scarcity by augmenting training data with synthetic trajectories from a social SIAR compartmental model incorporating age structure and parameter uncertainty. It evaluates two neural network paradigms—Physics-Informed Neural Networks and Nonlinear Autoregressive networks—to forecast COVID-19 dynamics in Lombardy, showing that NAR excels in short-term quantitative predictions while PINNs provide valuable long-term qualitative insights. The key contribution is a data augmentation framework that improves predictive accuracy and robustness to uncertainty, demonstrated across real and synthetic data and two network architectures. This approach offers scalable, uncertainty-aware tools for rapid and reliable public health decision support during evolving outbreaks.

Abstract

In this work, we propose a data augmentation strategy aimed at improving the training phase of neural networks and, consequently, the accuracy of their predictions. Our approach relies on generating synthetic data through a suitable compartmental model combined with the incorporation of uncertainty. The available data are then used to calibrate the model, which is further integrated with deep learning techniques to produce additional synthetic data for training. The results show that neural networks trained on these augmented datasets exhibit significantly improved predictive performance. We focus in particular on two different neural network architectures: Physics-Informed Neural Networks (PINNs) and Nonlinear Autoregressive (NAR) models. The NAR approach proves especially effective for short-term forecasting, providing accurate quantitative estimates by directly learning the dynamics from data and avoiding the additional computational cost of embedding physical constraints into the training. In contrast, PINNs yield less accurate quantitative predictions but capture the qualitative long-term behavior of the system, making them more suitable for exploring broader dynamical trends. Numerical simulations of the second phase of the COVID-19 pandemic in the Lombardy region (Italy) validate the effectiveness of the proposed approach.

Paper Structure

This paper contains 14 sections, 30 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Dynamics of the infected population obtained by solving the calibrated social-SIAR model \ref{['eq:social_SIAR']} and compared with experimental data. The figure shows the mean epidemic trajectory with the 95% confidence interval (shaded area), alongside the observed data. The black dashed line separates the two epidemic phases.
  • Figure 2: Dynamics of the infected population obtained by solving the social-SIAR model \ref{['eq:social_SIAR_ages']} and compared with experimental data. The plots show the mean epidemic trajectory with the 95% confidence interval (shaded area), alongside the observed data. The black dashed line separates the two epidemic phases. Each image corresponds to a different age group.
  • Figure 3: Physics informed neural network for the social SIAR model \ref{['eq:social_SIAR']}. Solution obtained by training a PINN network on both real and synthetic data, compared to the available data. On the left, the solution computed on the training set. On the right, the solution computed on the test set.
  • Figure 4: Physics informed neural network (training set) for the age-structured social SIAR model \ref{['eq:social_SIAR_ages']}. Solution obtained by training a PINN network on both real and synthetic data, compared to the available data. Each plot corresponds to a different age class. Training set.
  • Figure 5: Physics informed neural network (test set) for the age-structured social SIAR model \ref{['eq:social_SIAR_ages']}. Solution obtained by training a PINN network on both real and synthetic data, compared to the available data. Each plot corresponds to a different age class. Test set.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Remark 1