Optimizing Heat Alert Issuance with Reinforcement Learning

Ellen M. Considine; Rachel C. Nethery; Gregory A. Wellenius; Francesca Dominici; Mauricio Tec

Optimizing Heat Alert Issuance with Reinforcement Learning

Ellen M. Considine, Rachel C. Nethery, Gregory A. Wellenius, Francesca Dominici, Mauricio Tec

TL;DR

This work frames heat alert issuance as a sequential decision problem under a strict budget and spatially varying health effects. It introduces BROACH, a Bayesian rewards over actual climate history RL environment that leverages real weather trajectories and a hierarchical reward model to estimate hospitalization impacts. The study shows standard RL methods require pragmatic constraints (e.g., very-hot-day restrictions) and per-county specialization to outperform the National Weather Service policy, with post-hoc analyses identifying when and where gains are attainable. The findings offer practical insights for deploying data-driven heat-alert strategies and point to future avenues in safe policy learning, regional generalization, and multi-objective public-health optimization.

Abstract

A key strategy in societal adaptation to climate change is using alert systems to prompt preventative action and reduce the adverse health impacts of extreme heat events. This paper implements and evaluates reinforcement learning (RL) as a tool to optimize the effectiveness of such systems. Our contributions are threefold. First, we introduce a new publicly available RL environment enabling the evaluation of the effectiveness of heat alert policies to reduce heat-related hospitalizations. The rewards model is trained from a comprehensive dataset of historical weather, Medicare health records, and socioeconomic/geographic features. We use scalable Bayesian techniques tailored to the low-signal effects and spatial heterogeneity present in the data. The transition model uses real historical weather patterns enriched by a data augmentation mechanism based on climate region similarity. Second, we use this environment to evaluate standard RL algorithms in the context of heat alert issuance. Our analysis shows that policy constraints are needed to improve RL's initially poor performance. Third, a post-hoc contrastive analysis provides insight into scenarios where our modified heat alert-RL policies yield significant gains/losses over the current National Weather Service alert policy in the United States.

Optimizing Heat Alert Issuance with Reinforcement Learning

TL;DR

Abstract

Paper Structure (43 sections, 8 equations, 15 figures, 6 tables)

This paper contains 43 sections, 8 equations, 15 figures, 6 tables.

Introduction
Related Work
Heat alert optimization
RL with exogenous states
Statistical modeling for RL environments
Constrained learning
Contrastive policy explanations
Problem Setup
RL Preliminaries
Issuing Heat Alerts as a Constrained MDP
BROACH: An RL Environment for Optimizing Heat Alert Issuance
Data Sources
Heat alerts and heat index
Heat-related hospitalizations
Counties
...and 28 more sections

Figures (15)

Figure 1: Overview of the heat alerts RL framework.
Figure 2: Map of the counties considered and their regional climate zone classifications. All the colored counties are used in the Bayesian rewards model and RL environment; the 30 counties with annotated FIPS codes are used in the RL experiments.
Figure 3: An example of observed and counterfactual heat alert policies for a single summer (2015 is the most recent year in our evaluation set), with estimates of the number of NOHR hospitalizations saved (compared to nws) per 10,000 Medicare enrollees under each policy. The dashed lines indicate the optimized QHI threshold of the policy in the same color. The multiple horizontal lines of pink and green dots indicate five different samples from trpo.qhi and a2c.qhi respectively. For these two policies, the number of NOHR hospitalizations saved is the average of all their evaluations over 2015.
Figure S4: Unobserved mediation of the effect of heat alerts on hospitalizations by heightened awareness.
Figure S5: Smoothed trajectories of observed quantile of heat index, modeled baseline NOHR hospitalization rate, and modeled alert effectiveness (assuming no past alerts) across days of summer; colored by climate region. Note that $\lambda$ and $\tau$ are normalized as they are in the RL reward function.
...and 10 more figures

Optimizing Heat Alert Issuance with Reinforcement Learning

TL;DR

Abstract

Optimizing Heat Alert Issuance with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (15)