Table of Contents
Fetching ...

On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19

Giacomo Iannucci, Petros Barmpounakis, Alexandros Beskos, Nikolaos Demiris

TL;DR

The paper introduces a real-time decision-support framework for epidemic control that integrates a SEIR–VU compartmental model with sequential Bayesian inference via SMC2 and reinforcement learning to balance ICU burden against socio-economic costs. It offers two policy options: an interpretable ICU-threshold rule and a posterior-averaged Q-learning agent, both evaluated on England's COVID-19 ICU data over a 300-day horizon with decisions every 10 days. Key contributions include a 14-compartment SEIR–VU model with vaccination and waning immunity, a Bayesian sequential learning loop that updates posteriors in real time, and two RL planning engines that produce counterfactual policy evaluations against historical interventions. The findings demonstrate substantial ICU burden reduction under the RL controllers, with Q-learning providing more robust performance under higher socio-economic costs and offering practical, scalable decision support for epidemic management.

Abstract

This paper presents a real time, data driven decision support framework for epidemic control. We combine a compartmental epidemic model with sequential Bayesian inference and reinforcement learning (RL) controllers that adaptively choose intervention levels to balance disease burden, such as intensive care unit (ICU) load, against socio economic costs. We construct a context specific cost function using empirical experiments and expert feedback. We study two RL policies: an ICU threshold rule computed via Monte Carlo grid search, and a policy based on a posterior averaged Q learning agent. We validate the framework by fitting the epidemic model to publicly available ICU occupancy data from the COVID 19 pandemic in England and then generating counterfactual roll out scenarios under each RL controller, which allows us to compare the RL policies to the historical government strategy. Over a 300 day period and for a range of cost parameters, both controllers substantially reduce ICU burden relative to the observed interventions, illustrating how Bayesian sequential learning combined with RL can support the design of epidemic control policies.

On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19

TL;DR

The paper introduces a real-time decision-support framework for epidemic control that integrates a SEIR–VU compartmental model with sequential Bayesian inference via SMC2 and reinforcement learning to balance ICU burden against socio-economic costs. It offers two policy options: an interpretable ICU-threshold rule and a posterior-averaged Q-learning agent, both evaluated on England's COVID-19 ICU data over a 300-day horizon with decisions every 10 days. Key contributions include a 14-compartment SEIR–VU model with vaccination and waning immunity, a Bayesian sequential learning loop that updates posteriors in real time, and two RL planning engines that produce counterfactual policy evaluations against historical interventions. The findings demonstrate substantial ICU burden reduction under the RL controllers, with Q-learning providing more robust performance under higher socio-economic costs and offering practical, scalable decision support for epidemic management.

Abstract

This paper presents a real time, data driven decision support framework for epidemic control. We combine a compartmental epidemic model with sequential Bayesian inference and reinforcement learning (RL) controllers that adaptively choose intervention levels to balance disease burden, such as intensive care unit (ICU) load, against socio economic costs. We construct a context specific cost function using empirical experiments and expert feedback. We study two RL policies: an ICU threshold rule computed via Monte Carlo grid search, and a policy based on a posterior averaged Q learning agent. We validate the framework by fitting the epidemic model to publicly available ICU occupancy data from the COVID 19 pandemic in England and then generating counterfactual roll out scenarios under each RL controller, which allows us to compare the RL policies to the historical government strategy. Over a 300 day period and for a range of cost parameters, both controllers substantially reduce ICU burden relative to the observed interventions, illustrating how Bayesian sequential learning combined with RL can support the design of epidemic control policies.

Paper Structure

This paper contains 30 sections, 53 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Flow diagram of the SEIR model.
  • Figure 2: State‐space model over three time steps. Latent states $X_{t-1},X_t,X_{t+1}$ evolve according to the Markov dynamics \ref{['eq:ssm_process']}, and each gives rise to a noisy observation amongst $Y_{t-1},Y_t,Y_{t+1}$ via \ref{['eq:ssm_observation']}.
  • Figure 3: SEIR--VU model dynamics.
  • Figure 4: One-block decision loop. At decision-time $t_0$, ICU data update the SMC2 posterior; state and parameter $(s_{t_0},\theta)$ feed into a planner (horizon $H = 100$) yielding policy ${\pi}_{t_0}$ and action $a_{t_0}$; the model generator runs to $t_0 + \Delta$, producing 'counterfactual' ICU data for the next update.
  • Figure 5: Observed ICU occupancy versus average values of simulated roll-outs obtained via the SEIR–VU model, under the real government NPIs.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2