Table of Contents
Fetching ...

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

TL;DR

This work tackles reducing cannabis use among emerging adults by introducing reBandit, an online RL algorithm that combines random effects with informative Bayesian priors to rapidly and robustly learn in noisy mobile-health data. It employs a Bayesian linear mixed model with action-centered rewards, online Empirical Bayes updates for hyperparameters, and a smooth posterior sampling-based action selector to balance exploration and exploitation while enabling reproducibility. Evaluation via a SARA-derived simulation testbed shows reBandit matches or surpasses full pooling baselines (BLR) and random strategies, with gains increasing as population heterogeneity grows, underscoring its value for personalized, scalable mHealth interventions. The method is designed for MiWaves’ JITAI deployment and is complemented by public code and thorough experimental design to facilitate replication and adaptation in similar behavioral-health contexts.

Abstract

The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

TL;DR

This work tackles reducing cannabis use among emerging adults by introducing reBandit, an online RL algorithm that combines random effects with informative Bayesian priors to rapidly and robustly learn in noisy mobile-health data. It employs a Bayesian linear mixed model with action-centered rewards, online Empirical Bayes updates for hyperparameters, and a smooth posterior sampling-based action selector to balance exploration and exploitation while enabling reproducibility. Evaluation via a SARA-derived simulation testbed shows reBandit matches or surpasses full pooling baselines (BLR) and random strategies, with gains increasing as population heterogeneity grows, underscoring its value for personalized, scalable mHealth interventions. The method is designed for MiWaves’ JITAI deployment and is complemented by public code and thorough experimental design to facilitate replication and adaptation in similar behavioral-health contexts.

Abstract

The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.
Paper Structure (40 sections, 59 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 40 sections, 59 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Summary of the MiWaves pilot study. $m=120$ EAs are expected to be recruited through social media ads. Each EA will be in the trial for 30 days, and will be asked to self-report twice daily - once in the morning and once in the evening. Upon completion or time expiration of the self-reporting, the RL algorithm will decide whether to send or not send an intervention message.
  • Figure 4: Bar plot of coefficients of features in the MLR user models relative to coefficients of class 0, across all $N=42$ users.
  • Figure 5: Comparison of log loss between the two models across all users
  • Figure 7: GEE Results
  • Figure 8: Average posterior means and variances in the minimal treatment effect environment with no habituation
  • ...and 1 more figures