Adaptive Interventions with User-Defined Goals for Health Behavior Change
Aishwarya Mandyam, Matthew Jörke, William Denton, Barbara E. Engelhardt, Emma Brunskill
TL;DR
The paper tackles adaptive health behavior interventions delivered via mobile health apps, addressing the gap where prior work optimizes a single, shared outcome and neglects user-specific goals and constraints. It introduces a novel Thompson sampling algorithm for linear contextual bandits that optimizes a personalized reward function, r_{i,t}, defined as a weighted sum of user utilities while allowing data sharing across users. The authors prove a Bayesian regret bound $BR(N) \leq O(L d \sqrt{N \log(NM) \log(N/d)})$, showing the approach maintains sample efficiency despite personalization. Empirically, the method outperforms baselines on both synthetic and semi-synthetic gym-attendance simulations and is grounded by an online preference study and a real gym dataset, demonstrating improved goal alignment and adherence potential in health behavior change applications.
Abstract
Promoting healthy lifestyle behaviors remains a major public health concern, particularly due to their crucial role in preventing chronic conditions such as cancer, heart disease, and type 2 diabetes. Mobile health applications present a promising avenue for low-cost, scalable health behavior change promotion. Researchers are increasingly exploring adaptive algorithms that personalize interventions to each person's unique context. However, in empirical studies, mobile health applications often suffer from small effect sizes and low adherence rates, particularly in comparison to human coaching. Tailoring advice to a person's unique goals, preferences, and life circumstances is a critical component of health coaching that has been underutilized in adaptive algorithms for mobile health interventions. To address this, we introduce a new Thompson sampling algorithm that can accommodate personalized reward functions (i.e., goals, preferences, and constraints), while also leveraging data sharing across individuals to more quickly be able to provide effective recommendations. We prove that our modification incurs only a constant penalty on cumulative regret while preserving the sample complexity benefits of data sharing. We present empirical results on synthetic and semi-synthetic physical activity simulators, where in the latter we conducted an online survey to solicit preference data relating to physical activity, which we use to construct realistic reward models that leverages historical data from another study. Our algorithm achieves substantial performance improvements compared to baselines that do not share data or do not optimize for individualized rewards.
