Optimizing Heat Alert Issuance with Reinforcement Learning
Ellen M. Considine, Rachel C. Nethery, Gregory A. Wellenius, Francesca Dominici, Mauricio Tec
TL;DR
This work frames heat alert issuance as a sequential decision problem under a strict budget and spatially varying health effects. It introduces BROACH, a Bayesian rewards over actual climate history RL environment that leverages real weather trajectories and a hierarchical reward model to estimate hospitalization impacts. The study shows standard RL methods require pragmatic constraints (e.g., very-hot-day restrictions) and per-county specialization to outperform the National Weather Service policy, with post-hoc analyses identifying when and where gains are attainable. The findings offer practical insights for deploying data-driven heat-alert strategies and point to future avenues in safe policy learning, regional generalization, and multi-objective public-health optimization.
Abstract
A key strategy in societal adaptation to climate change is using alert systems to prompt preventative action and reduce the adverse health impacts of extreme heat events. This paper implements and evaluates reinforcement learning (RL) as a tool to optimize the effectiveness of such systems. Our contributions are threefold. First, we introduce a new publicly available RL environment enabling the evaluation of the effectiveness of heat alert policies to reduce heat-related hospitalizations. The rewards model is trained from a comprehensive dataset of historical weather, Medicare health records, and socioeconomic/geographic features. We use scalable Bayesian techniques tailored to the low-signal effects and spatial heterogeneity present in the data. The transition model uses real historical weather patterns enriched by a data augmentation mechanism based on climate region similarity. Second, we use this environment to evaluate standard RL algorithms in the context of heat alert issuance. Our analysis shows that policy constraints are needed to improve RL's initially poor performance. Third, a post-hoc contrastive analysis provides insight into scenarios where our modified heat alert-RL policies yield significant gains/losses over the current National Weather Service alert policy in the United States.
