Table of Contents
Fetching ...

H2-MARL: Multi-Agent Reinforcement Learning for Pareto Optimality in Hospital Capacity Strain and Human Mobility during Epidemic

Xueting Luo, Hao Deng, Jihong Yang, Yao Shen, Huanhuan Guo, Zhiyuan Sun, Mingqing Liu, Jiming Wei, Shengjie Zhao

TL;DR

This work introduces a township-level, data-driven spatiotemporal simulator (D-SIHR) and a multi-agent reinforcement learning framework (H2-MARL) to achieve Pareto optimality between hospital capacity strain and human mobility during epidemics. D-SIHR provides online parameter updating for infection dynamics across ADs, while H2-MARL uses dual-objective rewards with entropy-based adaptive weights and expert replay to balance healthcare impact and mobility restrictions. The approach is validated on a large multi-city origindestination mobility dataset and four city scenarios, showing superior dual-objective trade-offs, faster restriction success, and strong generalization. This methodology offers a practical, scalable tool for epidemic management and urban disaster planning in smart cities.

Abstract

The necessity of achieving an effective balance between minimizing the losses associated with restricting human mobility and ensuring hospital capacity has gained significant attention in the aftermath of COVID-19. Reinforcement learning (RL)-based strategies for human mobility management have recently advanced in addressing the dynamic evolution of cities and epidemics; however, they still face challenges in achieving coordinated control at the township level and adapting to cities of varying scales. To address the above issues, we propose a multi-agent RL approach that achieves Pareto optimality in managing hospital capacity and human mobility (H2-MARL), applicable across cities of different scales. We first develop a township-level infection model with online-updatable parameters to simulate disease transmission and construct a city-wide dynamic spatiotemporal epidemic simulator. On this basis, H2-MARL is designed to treat each division as an agent, with a trade-off dual-objective reward function formulated and an experience replay buffer enriched with expert knowledge built. To evaluate the effectiveness of the model, we construct a township-level human mobility dataset containing over one billion records from four representative cities of varying scales. Extensive experiments demonstrate that H2-MARL has the optimal dual-objective trade-off capability, which can minimize hospital capacity strain while minimizing human mobility restriction loss. Meanwhile, the applicability of the proposed model to epidemic control in cities of varying scales is verified, which showcases its feasibility and versatility in practical applications.

H2-MARL: Multi-Agent Reinforcement Learning for Pareto Optimality in Hospital Capacity Strain and Human Mobility during Epidemic

TL;DR

This work introduces a township-level, data-driven spatiotemporal simulator (D-SIHR) and a multi-agent reinforcement learning framework (H2-MARL) to achieve Pareto optimality between hospital capacity strain and human mobility during epidemics. D-SIHR provides online parameter updating for infection dynamics across ADs, while H2-MARL uses dual-objective rewards with entropy-based adaptive weights and expert replay to balance healthcare impact and mobility restrictions. The approach is validated on a large multi-city origindestination mobility dataset and four city scenarios, showing superior dual-objective trade-offs, faster restriction success, and strong generalization. This methodology offers a practical, scalable tool for epidemic management and urban disaster planning in smart cities.

Abstract

The necessity of achieving an effective balance between minimizing the losses associated with restricting human mobility and ensuring hospital capacity has gained significant attention in the aftermath of COVID-19. Reinforcement learning (RL)-based strategies for human mobility management have recently advanced in addressing the dynamic evolution of cities and epidemics; however, they still face challenges in achieving coordinated control at the township level and adapting to cities of varying scales. To address the above issues, we propose a multi-agent RL approach that achieves Pareto optimality in managing hospital capacity and human mobility (H2-MARL), applicable across cities of different scales. We first develop a township-level infection model with online-updatable parameters to simulate disease transmission and construct a city-wide dynamic spatiotemporal epidemic simulator. On this basis, H2-MARL is designed to treat each division as an agent, with a trade-off dual-objective reward function formulated and an experience replay buffer enriched with expert knowledge built. To evaluate the effectiveness of the model, we construct a township-level human mobility dataset containing over one billion records from four representative cities of varying scales. Extensive experiments demonstrate that H2-MARL has the optimal dual-objective trade-off capability, which can minimize hospital capacity strain while minimizing human mobility restriction loss. Meanwhile, the applicability of the proposed model to epidemic control in cities of varying scales is verified, which showcases its feasibility and versatility in practical applications.

Paper Structure

This paper contains 29 sections, 27 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Block diagram of the proposed environment simulator and interaction with mobility restriction strategy.
  • Figure 2: The D-SIHR model for AD $i$ on day $t$.
  • Figure 3: The explanation of $p_{uv}$. Black icons represent healthy individuals, while red icons signify infected individuals. In the top panel, individual $u$ can infect individual $v$ during the maximum serial interval $t_{SI}^{\max}$, with the serial interval following a Gamma distribution $\Gamma(\alpha, \beta)$. In the bottom panel, the set of infectors $\mathbf{I}_{[t_v - t_{SI}^{\max}, t_v]}$ has the ability to infect $v$, $u \in \mathbf{I}_{[t_v - t_{SI}^{\max}, t_v]}$.
  • Figure 4: Overview of H2-MARL model.
  • Figure 5: Simulation curves of the effective reproduction number ($R_t$) for COVID-19 and the 95% confidence intervals (CI): Guangzhou (a), Chongqing (b), Jiangsu (c), and Hubei (d). The blue shaded area represents the 95% confidence interval, and the green dashed line highlights the scenario where the epidemic is under control ($R_t$ = 1).
  • ...and 2 more figures