RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health Interventions
Easton K. Huch, Jieru Shi, Madeline R. Abbott, Jessica R. Golbus, Alexander Moreno, Walter H. Dempsey
TL;DR
RoME introduces a robust mixed-effects contextual bandit for mobile health by combining debiased machine learning, a partially linear reward model, and network cohesion to handle nonlinear baselines and longitudinal heterogeneity. The method yields a high-probability regret bound that scales with the differential-reward dimension $d$ and remains robust to misspecification of the baseline model. Empirical results show RoME outperforming competing approaches in heterogeneous and nonlinear settings, with strong off-policy gains in the Valentine and Intern Health Study datasets and substantial computational efficiency. This framework enables scalable, personalized, context-aware interventions in mHealth while providing theoretical guarantees and practical performance improvements.
Abstract
Mobile health leverages personalized and contextually tailored interventions optimized through bandit and reinforcement learning algorithms. In practice, however, challenges such as participant heterogeneity, nonstationarity, and nonlinear relationships hinder algorithm performance. We propose RoME, a Robust Mixed-Effects contextual bandit algorithm that simultaneously addresses these challenges via (1) modeling the differential reward with user- and time-specific random effects, (2) network cohesion penalties, and (3) debiased machine learning for flexible estimation of baseline rewards. We establish a high-probability regret bound that depends solely on the dimension of the differential-reward model, enabling us to achieve robust regret bounds even when the baseline reward is highly complex. We demonstrate the superior performance of the RoME algorithm in a simulation and two off-policy evaluation studies.
