reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use
Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy
TL;DR
This work tackles reducing cannabis use among emerging adults by introducing reBandit, an online RL algorithm that combines random effects with informative Bayesian priors to rapidly and robustly learn in noisy mobile-health data. It employs a Bayesian linear mixed model with action-centered rewards, online Empirical Bayes updates for hyperparameters, and a smooth posterior sampling-based action selector to balance exploration and exploitation while enabling reproducibility. Evaluation via a SARA-derived simulation testbed shows reBandit matches or surpasses full pooling baselines (BLR) and random strategies, with gains increasing as population heterogeneity grows, underscoring its value for personalized, scalable mHealth interventions. The method is designed for MiWaves’ JITAI deployment and is complemented by public code and thorough experimental design to facilitate replication and adaptation in similar behavioral-health contexts.
Abstract
The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.
