Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

Harsh Kumar; Tong Li; Jiakai Shi; Ilya Musabirov; Rachel Kornfield; Jonah Meyerhoff; Ananya Bhattacharjee; Chris Karr; Theresa Nguyen; David Mohr; Anna Rafferty; Sofia Villar; Nina Deliu; Joseph Jay Williams

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

Harsh Kumar, Tong Li, Jiakai Shi, Ilya Musabirov, Rachel Kornfield, Jonah Meyerhoff, Ananya Bhattacharjee, Chris Karr, Theresa Nguyen, David Mohr, Anna Rafferty, Sofia Villar, Nina Deliu, Joseph Jay Williams

TL;DR

The paper tackles improving engagement in digital mental health by deploying contextual multi-armed bandit (MAB) algorithms, notably Contextual Thompson Sampling, within an 8-week text-message DMH intervention. It presents a two-year software platform that instruments modular DMH content for adaptive assignment and runs side-by-side comparisons with uniform random designs, enabling both user-level reward optimization and rigorous data collection for social-behavioral analysis. Through simulations and a real-world deployment with 1100+ users, the study shows Contextual TS can elevate rewards and reveal contextual effects (e.g., Mood) while highlighting biases and power considerations inherent to adaptive data collection. The work offers a scalable testbed for adaptive experimentation in DMH and points to broader applicability in other domains, balancing user experience with robust scientific inference.

Abstract

Digital mental health (DMH) interventions, such as text-message-based lessons and activities, offer immense potential for accessible mental health support. While these interventions can be effective, real-world experimental testing can further enhance their design and impact. Adaptive experimentation, utilizing algorithms like Thompson Sampling for (contextual) multi-armed bandit (MAB) problems, can lead to continuous improvement and personalization. However, it remains unclear when these algorithms can simultaneously increase user experience rewards and facilitate appropriate data collection for social-behavioral scientists to analyze with sufficient statistical confidence. Although a growing body of research addresses the practical and statistical aspects of MAB and other adaptive algorithms, further exploration is needed to assess their impact across diverse real-world contexts. This paper presents a software system developed over two years that allows text-messaging intervention components to be adapted using bandit and other algorithms while collecting data for side-by-side comparison with traditional uniform random non-adaptive experiments. We evaluate the system by deploying a text-message-based DMH intervention to 1100 users, recruited through a large mental health non-profit organization, and share the path forward for deploying this system at scale. This system not only enables applications in mental health but could also serve as a model testbed for adaptive experimentation algorithms in other domains.

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

TL;DR

Abstract

Paper Structure (20 sections, 3 equations, 7 figures, 5 tables)

This paper contains 20 sections, 3 equations, 7 figures, 5 tables.

Introduction
Related Work
Digital Mental Health and Text-message based interventions
Adaptive Experiments and Multi-Armed Bandit Algorithms
Design of Intervention
Adaptive Components in DMH Intervention
Algorithms for Adaptive Experimentation
Problem Formulation
Algorithm
Evaluation of System
Simulation settings
Simulation analysis
Real-world Deployment
Engagement
Efficiency of Policies
...and 5 more sections

Figures (7)

Figure 1: Schematic representation of an example sequence of messages a user could receive during days Di, where Di is any random day within the 8-week-long intervention when the user receives a message. Mood and Energy during each day Di represent a subset of contexts describing the user during that particular day, and Reward indicates a response from the user to the question "How helpful were these messages? Reply with a number 1 (not at all helpful) to 5 (very helpful)" after receiving the messages during the day. The messages are composed of three modular components in a 2 (Rationale: present vs absent) x 2 (Link: present vs absent) x 4 (Interaction type: 4 options) factorial design.
Figure 2: Average rewards using Thompson Sampling for Contextual Bandits (Contextual TS) versus Uniform Random in different cases. The first pair of bars compares the reward in the group of participants having low mood, the second pair compares in high mood group, and the last pair takes the average among all participants.
Figure 3: Average reward (rating of 1 to 5 scaled) using Contextual TS versus Uniform Random for "Link" rating for different levels of contextual variables. Figure A shows the distribution for contextual variable Mood (Low vs High). Number of participants (N) from left to right is [322, 316, 83, 87]. Figure B shows the distribution for Activity in last 48 hours (Yes vs No). N from left to right are [67, 75, 338, 329].
Figure 4: Arm allocation dynamics for Contextual Thompson Sampling (left) vs Uniform Random (right). On both parts of the graph, columns represent arms, and grid rows artificially split experiments by approximately one-month periods, allowing to compare arm allocation in different stages of the experiment. Each small square is one reward we receive with a fill color representing how helpful the participant found the message. On the right, participants are assigned to a traditional experiment aiming for a constant 50-50 split. The "No Link" condition tends to yield lower rewards (represented by brownish squares). On the left, adaptivity is displayed. In the first period, there was a marginally better response for "No Link," leading to some allocations to this arm in the second period. However, the algorithm was able to adjust based on responses, consistently allocating more interactions to the "Link" arm in months 3 and 4.
Figure 5: Overall design of the software system. The system comprises of two main components - Dialogue System (to deliver text-messages) and Personalization System.
...and 2 more figures

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

TL;DR

Abstract

Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental Health

Authors

TL;DR

Abstract

Table of Contents

Figures (7)