Table of Contents
Fetching ...

Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach

Niclas Boehmer, Yunfan Zhao, Guojun Xiong, Paula Rodriguez-Diaz, Paola Del Cueto Cibrian, Joseph Ngonzi, Adeline Boatin, Milind Tambe

TL;DR

This work adopts the popular Proximal Policy Optimization algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network and demonstrates in simulations that this approach outperforms the best heuristic baseline by up to a factor of 4.

Abstract

Maternal mortality remains a significant global public health challenge. One promising approach to reducing maternal deaths occurring during facility-based childbirth is through early warning systems, which require the consistent monitoring of mothers' vital signs after giving birth. Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring, but their scarcity raises the critical question of how to allocate them most effectively. We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit (RMAB) paradigm. In doing so, we identify and address novel, previously unstudied constraints unique to this domain, which render previous approaches for RMABs unsuitable and significantly increase the complexity of the learning and planning problem. To overcome these challenges, we adopt the popular Proximal Policy Optimization (PPO) algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network. We demonstrate in simulations that our approach outperforms the best heuristic baseline by up to a factor of $4$.

Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach

TL;DR

This work adopts the popular Proximal Policy Optimization algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network and demonstrates in simulations that this approach outperforms the best heuristic baseline by up to a factor of 4.

Abstract

Maternal mortality remains a significant global public health challenge. One promising approach to reducing maternal deaths occurring during facility-based childbirth is through early warning systems, which require the consistent monitoring of mothers' vital signs after giving birth. Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring, but their scarcity raises the critical question of how to allocate them most effectively. We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit (RMAB) paradigm. In doing so, we identify and address novel, previously unstudied constraints unique to this domain, which render previous approaches for RMABs unsuitable and significantly increase the complexity of the learning and planning problem. To overcome these challenges, we adopt the popular Proximal Policy Optimization (PPO) algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network. We demonstrate in simulations that our approach outperforms the best heuristic baseline by up to a factor of .

Paper Structure

This paper contains 27 sections, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Wireless vital sign monitoring device on the arm of a mother.
  • Figure 2: Results on MIMIC-III (top) and MIMIC-IV (bottom), averaged over $100$ random seeds for varying budget $B$ and number of patients $N$. The error bars show the standard error of the generated reward, which is normalized by subtracting the reward of the $\mathtt{No \ Action}$ baseline and then dividing by $N$. See Table \ref{['table:appendix_main_res']} in Appendix \ref{['sec:appendix_exps']} for additional experimental results.
  • Figure 3: Initial results on data from the Mbarara Hospital, averaged over $100$ random seeds. The error bars show the standard error of rewards, which are normalized by subtracting the reward of the $\mathtt{No \ Action}$ baseline and then dividing by $N$ (see \ref{['sec:appendix_exps']} for additional settings).
  • Figure 4: Cumulative Distribution Function (CDF) of the number of arms based on the number of active times (Action 1) in the MIMIC dataset. The plot shows the probability distribution of arms being active a certain number of times.
  • Figure 5: Analysis of the critical state dimensions that influence the decision to remove a device from a patient under the MIMIC dataset. The six state dimensions considered are PULSE_RATE, RESPIRATORY_RATE, COVERED_SKIN_TEMPERATURE, and variations of each vital sign. The histograms depict the distribution of state values before the transition from active to passive action, highlighting which factors might be most influential in triggering the change.
  • ...and 3 more figures