reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

Susobhan Ghosh; Yongyi Guo; Pei-Yao Hung; Lara Coughlin; Erin Bonar; Inbal Nahum-Shani; Maureen Walton; Susan Murphy

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

TL;DR

This work tackles reducing cannabis use among emerging adults by introducing reBandit, an online RL algorithm that combines random effects with informative Bayesian priors to rapidly and robustly learn in noisy mobile-health data. It employs a Bayesian linear mixed model with action-centered rewards, online Empirical Bayes updates for hyperparameters, and a smooth posterior sampling-based action selector to balance exploration and exploitation while enabling reproducibility. Evaluation via a SARA-derived simulation testbed shows reBandit matches or surpasses full pooling baselines (BLR) and random strategies, with gains increasing as population heterogeneity grows, underscoring its value for personalized, scalable mHealth interventions. The method is designed for MiWaves’ JITAI deployment and is complemented by public code and thorough experimental design to facilitate replication and adaptation in similar behavioral-health contexts.

Abstract

The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

TL;DR

Abstract

Paper Structure (40 sections, 59 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 40 sections, 59 equations, 6 figures, 5 tables, 1 algorithm.

Introduction & Motivation
MiWaves pilot study
Challenges, Contributions and Overview
RL Framework and Notation
Related Work
Bandit algorithm: reBandit
Online Learning Algorithm
Reward Approximating Function
Online model update procedure
Action selection procedure
Reward Engineering
Experimental Results
Simulation Testbed Design
Simulation Results
Conclusion
...and 25 more sections

Figures (6)

Figure 1: Summary of the MiWaves pilot study. $m=120$ EAs are expected to be recruited through social media ads. Each EA will be in the trial for 30 days, and will be asked to self-report twice daily - once in the morning and once in the evening. Upon completion or time expiration of the self-reporting, the RL algorithm will decide whether to send or not send an intervention message.
Figure 4: Bar plot of coefficients of features in the MLR user models relative to coefficients of class 0, across all $N=42$ users.
Figure 5: Comparison of log loss between the two models across all users
Figure 7: GEE Results
Figure 8: Average posterior means and variances in the minimal treatment effect environment with no habituation
...and 1 more figures

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

TL;DR

Abstract

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

Authors

TL;DR

Abstract

Table of Contents

Figures (6)