Learning Pareto-Optimal Pandemic Intervention Policies with MORL
Marian Chen, Miri Zilka
TL;DR
The paper tackles the challenge of designing pandemic intervention policies that balance health outcomes with economic stability by employing a simulator-driven multi-objective reinforcement learning (MORL) approach. It introduces a high-fidelity, stochastic differential equation (SDE) based pandemic simulator calibrated to global data, and trains a Pareto-Conditioned Network (PCN) to learn Pareto-optimal intervention policies across diverse pathogens and outbreak severities. The framework supports policy exploration through a multi-objective reward structure and demonstrates its generality with COVID-19, polio, influenza, and measles case studies, including vaccination coverage effects. The results show how trade-offs shift with disease characteristics and urban density, offering an interpretable decision-support tool for policy makers that can adapt to future public health crises.
Abstract
The COVID-19 pandemic underscored a critical need for intervention strategies that balance disease containment with socioeconomic stability. We approach this challenge by designing a framework for modeling and evaluating disease-spread prevention strategies. Our framework leverages multi-objective reinforcement learning (MORL) - a formulation necessitated by competing objectives - combined with a new stochastic differential equation (SDE) pandemic simulator, calibrated and validated against global COVID-19 data. Our simulator reproduces national-scale pandemic dynamics with orders of magnitude higher fidelity than other models commonly used in reinforcement learning (RL) approaches to pandemic intervention. Training a Pareto-Conditioned Network (PCN) agent on this simulator, we illustrate the direct policy trade-offs between epidemiological control and economic stability for COVID-19. Furthermore, we demonstrate the framework's generality by extending it to pathogens with different epidemiological profiles, such as polio and influenza, and show how these profiles lead the agent to discover fundamentally different intervention policies. To ground our work in contemporary policymaking challenges, we apply the model to measles outbreaks, quantifying how a modest 5% drop in vaccination coverage necessitates significantly more stringent and costly interventions to curb disease spread. This work provides a robust and adaptable framework to support transparent, evidence-based policymaking for mitigating public health crises.
