On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19

Giacomo Iannucci; Petros Barmpounakis; Alexandros Beskos; Nikolaos Demiris

On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19

Giacomo Iannucci, Petros Barmpounakis, Alexandros Beskos, Nikolaos Demiris

TL;DR

The paper introduces a real-time decision-support framework for epidemic control that integrates a SEIR–VU compartmental model with sequential Bayesian inference via SMC2 and reinforcement learning to balance ICU burden against socio-economic costs. It offers two policy options: an interpretable ICU-threshold rule and a posterior-averaged Q-learning agent, both evaluated on England's COVID-19 ICU data over a 300-day horizon with decisions every 10 days. Key contributions include a 14-compartment SEIR–VU model with vaccination and waning immunity, a Bayesian sequential learning loop that updates posteriors in real time, and two RL planning engines that produce counterfactual policy evaluations against historical interventions. The findings demonstrate substantial ICU burden reduction under the RL controllers, with Q-learning providing more robust performance under higher socio-economic costs and offering practical, scalable decision support for epidemic management.

Abstract

This paper presents a real time, data driven decision support framework for epidemic control. We combine a compartmental epidemic model with sequential Bayesian inference and reinforcement learning (RL) controllers that adaptively choose intervention levels to balance disease burden, such as intensive care unit (ICU) load, against socio economic costs. We construct a context specific cost function using empirical experiments and expert feedback. We study two RL policies: an ICU threshold rule computed via Monte Carlo grid search, and a policy based on a posterior averaged Q learning agent. We validate the framework by fitting the epidemic model to publicly available ICU occupancy data from the COVID 19 pandemic in England and then generating counterfactual roll out scenarios under each RL controller, which allows us to compare the RL policies to the historical government strategy. Over a 300 day period and for a range of cost parameters, both controllers substantially reduce ICU burden relative to the observed interventions, illustrating how Bayesian sequential learning combined with RL can support the design of epidemic control policies.

On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19

TL;DR

Abstract

On a Reinforcement Learning Methodology for Epidemic Control, with application to COVID-19

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)

Theorems & Definitions (2)