Table of Contents
Fetching ...

Practical considerations when designing an online learning algorithm for an app-based mHealth intervention

Rachel T Gonzalez, Madeline R Abbott, Brahmajee Nallamothu, Scott Hummel, Michael Dorsch, Walter Dempsey

TL;DR

The paper tackles the practical design challenges of deploying online reinforcement learning in app-based mHealth trials by detailing LS4L2, a contextual bandit system that optimizes notifications through a probabilistic mapping learned via hierarchical Bayesian logistic regression. It presents a concrete template addressing reward definition, optimization timescale, automated learning robustness, computational trade-offs, and missing data handling, supported by simulation comparing LS4L2 with simpler and more complex baselines. Key contributions include a principled approach to partial pooling with weak priors to prevent model breakdowns, a monitoring framework for both algorithm performance and calibration, and actionable guidelines for model specification under resource constraints. The findings highlight the necessity of balancing personalization with computation and emphasize alignment between reward design and behavioral targets, offering practical pathways for scalable, stable, and interpretable RL-enabled digital interventions in real-world clinical trials.

Abstract

The ubiquitous nature of mobile health (mHealth) technology has expanded opportunities for the integration of reinforcement learning into traditional clinical trial designs, allowing researchers to learn individualized treatment policies during the study. LowSalt4Life 2 (LS4L2) is a recent trial aimed at reducing sodium intake among hypertensive individuals through an app-based intervention. A reinforcement learning algorithm, which was deployed in one of the trial arms, was designed to send reminder notifications to promote app engagement in contexts where the notification would be effective, i.e., when a participant is likely to open the app in the next 30-minute and not when prior data suggested reduced effectiveness. Such an algorithm can improve app-based mHealth interventions by reducing participant burden and more effectively promoting behavior change. We encountered various challenges during the implementation of the learning algorithm, which we present as a template to solving challenges in future trials that deploy reinforcement learning algorithms. We provide template solutions based on LS4L2 for solving the key challenges of (i) defining a relevant reward, (ii) determining a meaningful timescale for optimization, (iii) specifying a robust statistical model that allows for automation, (iv) balancing model flexibility with computational cost, and (v) addressing missing values in gradually collected data.

Practical considerations when designing an online learning algorithm for an app-based mHealth intervention

TL;DR

The paper tackles the practical design challenges of deploying online reinforcement learning in app-based mHealth trials by detailing LS4L2, a contextual bandit system that optimizes notifications through a probabilistic mapping learned via hierarchical Bayesian logistic regression. It presents a concrete template addressing reward definition, optimization timescale, automated learning robustness, computational trade-offs, and missing data handling, supported by simulation comparing LS4L2 with simpler and more complex baselines. Key contributions include a principled approach to partial pooling with weak priors to prevent model breakdowns, a monitoring framework for both algorithm performance and calibration, and actionable guidelines for model specification under resource constraints. The findings highlight the necessity of balancing personalization with computation and emphasize alignment between reward design and behavioral targets, offering practical pathways for scalable, stable, and interpretable RL-enabled digital interventions in real-world clinical trials.

Abstract

The ubiquitous nature of mobile health (mHealth) technology has expanded opportunities for the integration of reinforcement learning into traditional clinical trial designs, allowing researchers to learn individualized treatment policies during the study. LowSalt4Life 2 (LS4L2) is a recent trial aimed at reducing sodium intake among hypertensive individuals through an app-based intervention. A reinforcement learning algorithm, which was deployed in one of the trial arms, was designed to send reminder notifications to promote app engagement in contexts where the notification would be effective, i.e., when a participant is likely to open the app in the next 30-minute and not when prior data suggested reduced effectiveness. Such an algorithm can improve app-based mHealth interventions by reducing participant burden and more effectively promoting behavior change. We encountered various challenges during the implementation of the learning algorithm, which we present as a template to solving challenges in future trials that deploy reinforcement learning algorithms. We provide template solutions based on LS4L2 for solving the key challenges of (i) defining a relevant reward, (ii) determining a meaningful timescale for optimization, (iii) specifying a robust statistical model that allows for automation, (iv) balancing model flexibility with computational cost, and (v) addressing missing values in gradually collected data.

Paper Structure

This paper contains 19 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: LS4L2 is a two-arm (App Alone vs App + JITAI) trial. After two months, participants in the App + JITAI arm undergo a second randomization to either continue with the current JITAI or transition to a personalized JITAI (pJITAI) . The App + pJITAI sub-arm includes a contextual bandit algorithm to optimize delivery of an app-based notification. Within this sub-arm, participants are randomized to potentially receive a mobile notification from the study's app each time they arrive at a grocery store or restaurant. The randomization probabilities are based on context covariates, all of which are collected passively.
  • Figure 2: As an additional layer of study monitoring, we examined the calibration of our algorithm. We calculated the empirical probability with which notifications were sent to study participants in bins corresponding to the individualized decision rules learned by the learning algorithm. Uncertainty was quantified with asymptotically normal 95% CIs for the observed probabilities.
  • Figure 3: Determining model complexity for the single participant model. Assume that the data for the current participant contains $p$ covariates (excluding intervention), which we index by $j$. We apply these rules to determine inclusion of fixed effects. Note that the number of "unknown" observations for each covariate are not counted towards the minimum cell size. For inclusion of random effect, a similar set of rules is used by only main effects and first-order interactions between intervention and covariates are considered (no three-way interactions). Different rules are applied for inclusion of baseline covariates.
  • Figure 4: Average cumulative regret across simulated datasets for each algorithm and in each setting are presented, along with pointwise interquartile bands for the cumulative regret for the LS4L2 and Simple algorithms. Due to computational constraints, the Complicated algorithm was only deployed on 5 simulated datasets in each setting. The cumulative regret curves for each of these simulations are shown as dashed lines.