Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits
Biyonka Liang, Lily Xu, Aparna Taneja, Milind Tambe, Lucas Janson
TL;DR
This paper tackles efficient allocation of scarce interventions in public health by modeling beneficiary adherence as a contextual and non-stationary restless RMAB with $N$ arms, budget $B$, and horizon $T$. It introduces BCoR, a Bayesian contextual RMAB method that blends hierarchical Bayesian modeling with Thompson sampling to share information within arms and across arms, and to handle non-stationarity via spline-based time effects. Key contributions include the articulated Bayesian learning framework for $P_i^{(t)}(1\mid s,a)$ with within-arm and across-arm sharing, the use of a Whittle-index policy for online arm selection, and extensive empirical validation on both simulated settings and a real ARMMAN data-driven scenario. BCoR demonstrates substantial finite-sample gains over strong baselines (including a $61\%$ increase in engagement in the ARMMAN experiment with $B=10$), supporting its practical potential for deployment in large-scale mHealth programs.
Abstract
Public health programs often provide interventions to encourage program adherence, and effectively allocating interventions is vital for producing the greatest overall health outcomes, especially in underserved communities where resources are limited. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, namely context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including a setting using real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal mHealth program, showcasing BCoR practical utility and potential for real-world deployment.
