Adaptive Interventions with User-Defined Goals for Health Behavior Change

Aishwarya Mandyam; Matthew Jörke; William Denton; Barbara E. Engelhardt; Emma Brunskill

Adaptive Interventions with User-Defined Goals for Health Behavior Change

Aishwarya Mandyam, Matthew Jörke, William Denton, Barbara E. Engelhardt, Emma Brunskill

TL;DR

The paper tackles adaptive health behavior interventions delivered via mobile health apps, addressing the gap where prior work optimizes a single, shared outcome and neglects user-specific goals and constraints. It introduces a novel Thompson sampling algorithm for linear contextual bandits that optimizes a personalized reward function, r_{i,t}, defined as a weighted sum of user utilities while allowing data sharing across users. The authors prove a Bayesian regret bound $BR(N) \leq O(L d \sqrt{N \log(NM) \log(N/d)})$, showing the approach maintains sample efficiency despite personalization. Empirically, the method outperforms baselines on both synthetic and semi-synthetic gym-attendance simulations and is grounded by an online preference study and a real gym dataset, demonstrating improved goal alignment and adherence potential in health behavior change applications.

Abstract

Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize interventions to each person's unique context. However, in empirical studies, mobile health applications often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. To address this, we introduce a new Thompson sampling algorithm that can accommodate personalized reward functions (i.e., goals, preferences, and constraints), while also leveraging data sharing across individuals to more quickly be able to provide effective recommendations. We prove that our modification incurs only a constant penalty on cumulative regret while preserving the sample complexity benefits of data sharing. We present empirical results on synthetic and semi-synthetic physical activity simulators, where in the latter we conducted an online survey to solicit preference data relating to physical activity, which we use to construct realistic reward models that leverages historical data from another study. Our algorithm achieves substantial performance improvements compared to baselines that do not share data or do not optimize for individualized rewards.

Adaptive Interventions with User-Defined Goals for Health Behavior Change

TL;DR

, showing the approach maintains sample efficiency despite personalization. Empirically, the method outperforms baselines on both synthetic and semi-synthetic gym-attendance simulations and is grounded by an online preference study and a real gym dataset, demonstrating improved goal alignment and adherence potential in health behavior change applications.

Abstract

Paper Structure (19 sections, 2 theorems, 21 equations, 9 figures, 2 tables)

This paper contains 19 sections, 2 theorems, 21 equations, 9 figures, 2 tables.

Introduction
Related Work
Preliminaries
Methods
Algorithm
Theoretical results
Experiments
Step Count Simulator
Gym Attendance Semi-Synthetic Simulator
Online Preference Study
Gym Attendance Dataset
Gym Attendance Simulator
Experiments
Discussion
Bayesian Regret Proof
...and 4 more sections

Key Result

theorem 1

After running Algorithm alg:ts_aug for $N=P \cdot T$ samples, where $P$ is the number of participants in the cohort, $T$ is the time horizon, and $M$ is the number of outcomes, we achieve Bayesian cumulative regret on the order of

Figures (9)

Figure 1: We compare step count $y_{i,t}$ and reward $r_{i,t}$ (Eq. \ref{['eq:reward']}) for two policies: one policy ($A=1$, brown) always sends notifications and another policy ($A = \mathbbm{1}\{y < \text{goal}\}$, pink) sends notifications only when a user's step count drops below their desired goal (dashed line). A policy that always sends a notification achieves higher step count, but lower reward due to notification burden. Shaded area is one standard error across $P=20$ participants.
Figure 2: We compare our Algorithm \ref{['alg:ts_aug']} ($\textsf{TS}(r)$, blue) to TS optimizing for step count ($\textsf{TS}(y)$, orange), a random policy ( gray), and independent TS policies for each user ($\textsf{TS}(r)$ no data sharing, green). We plot cumulative regret, which measures the sum of differences between the maximum possible reward and the reward achieved by a given policy at each timestep. Our algorithm achieves the best performance because it optimizes for the correct objective that considers notification burden and shares data across users. Shaded area is standard error across 100 trials with $P=40$ participants.
Figure 3: Participant preferences over the five intervention categories in our online survey, which we use to compute a preference vector $\boldsymbol{\alpha}_i$. While financial incentives are most popular and notification are least popular on average, participant varied widely in their individual preference ratings.
Figure 4: Participant preferences over the degree to which they would listen to an AI's advice, which we use to compute a preference weight $\beta_i$. Participants slightly disagreed with this statement on average.
Figure 5: We compare the weekly number of gym visits (left) and average preference value $\boldsymbol{\alpha}(a)$ for the recommended actions (right) for several policies: our algorithm ($\textsf{TS}(r)$, blue), Thompson sampling optimizing for gym visits ($\textsf{TS}(y)$, orange), and a random policy ( gray). We find that all policies achieve a similar number of gym visits, but only $\textsf{TS}(r)$ explicitly considers user preferences and achieves the highest preference alignment. Shaded area is standard error across 10 trials with $P=209$ participants each.
...and 4 more figures

Theorems & Definitions (2)

theorem 1
Theorem 1

Adaptive Interventions with User-Defined Goals for Health Behavior Change

TL;DR

Abstract

Adaptive Interventions with User-Defined Goals for Health Behavior Change

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)