Table of Contents
Fetching ...

Optimizing Algorithms for Mobile Health Interventions with Active Querying Optimization

Aseel Rawashdeh

TL;DR

The paper addresses the challenge of balancing intervention efficacy with user burden in mobile health RL by using the ACNO-MDP framework and the Act-Then-Measure (ATM) heuristic to decouple control and measurement. It introduces a Bayesian Kalman Q-learning extension to ATM to maintain uncertainty-aware value estimates and evaluates performance across toy ACNO-MDPs, FrozenLake variants, and the ADAPTS clinical trial-inspired testbed. Results show that Bayesian ATM improves stability and sample efficiency in small, tabular environments but struggles in complex, high-dimensional, delayed-feedback settings like ADAPTS, underscoring a mismatch between ATM assumptions and real-world mHealth dynamics. The study highlights the need for algorithms that jointly model measurement effects on dynamics and rewards and operate on continuous state spaces to enable practical, measurement-aware RL for personalized health interventions.

Abstract

Reinforcement learning in mobile health (mHealth) interventions requires balancing intervention efficacy with user burden, particularly when state measurements (for example, user surveys or feedback) are costly yet essential. The Act-Then-Measure (ATM) heuristic addresses this challenge by decoupling control and measurement actions within the Action-Contingent Noiselessly Observable Markov Decision Process (ACNO-MDP) framework. However, the standard ATM algorithm relies on a temporal-difference-inspired Q-learning method, which is prone to instability in sparse and noisy environments. In this work, we propose a Bayesian extension to ATM that replaces standard Q-learning with a Kalman filter-style Bayesian update, maintaining uncertainty-aware estimates of Q-values and enabling more stable and sample-efficient learning. We evaluate our method in both toy environments and clinically motivated testbeds. In small, tabular environments, Bayesian ATM achieves comparable or improved scalarized returns with substantially lower variance and more stable policy behavior. In contrast, in larger and more complex mHealth settings, both the standard and Bayesian ATM variants perform poorly, suggesting a mismatch between ATM's modeling assumptions and the structural challenges of real-world mHealth domains. These findings highlight the value of uncertainty-aware methods in low-data settings while underscoring the need for new RL algorithms that explicitly model causal structure, continuous states, and delayed feedback under observation cost constraints.

Optimizing Algorithms for Mobile Health Interventions with Active Querying Optimization

TL;DR

The paper addresses the challenge of balancing intervention efficacy with user burden in mobile health RL by using the ACNO-MDP framework and the Act-Then-Measure (ATM) heuristic to decouple control and measurement. It introduces a Bayesian Kalman Q-learning extension to ATM to maintain uncertainty-aware value estimates and evaluates performance across toy ACNO-MDPs, FrozenLake variants, and the ADAPTS clinical trial-inspired testbed. Results show that Bayesian ATM improves stability and sample efficiency in small, tabular environments but struggles in complex, high-dimensional, delayed-feedback settings like ADAPTS, underscoring a mismatch between ATM assumptions and real-world mHealth dynamics. The study highlights the need for algorithms that jointly model measurement effects on dynamics and rewards and operate on continuous state spaces to enable practical, measurement-aware RL for personalized health interventions.

Abstract

Reinforcement learning in mobile health (mHealth) interventions requires balancing intervention efficacy with user burden, particularly when state measurements (for example, user surveys or feedback) are costly yet essential. The Act-Then-Measure (ATM) heuristic addresses this challenge by decoupling control and measurement actions within the Action-Contingent Noiselessly Observable Markov Decision Process (ACNO-MDP) framework. However, the standard ATM algorithm relies on a temporal-difference-inspired Q-learning method, which is prone to instability in sparse and noisy environments. In this work, we propose a Bayesian extension to ATM that replaces standard Q-learning with a Kalman filter-style Bayesian update, maintaining uncertainty-aware estimates of Q-values and enabling more stable and sample-efficient learning. We evaluate our method in both toy environments and clinically motivated testbeds. In small, tabular environments, Bayesian ATM achieves comparable or improved scalarized returns with substantially lower variance and more stable policy behavior. In contrast, in larger and more complex mHealth settings, both the standard and Bayesian ATM variants perform poorly, suggesting a mismatch between ATM's modeling assumptions and the structural challenges of real-world mHealth domains. These findings highlight the value of uncertainty-aware methods in low-data settings while underscoring the need for new RL algorithms that explicitly model causal structure, continuous states, and delayed feedback under observation cost constraints.

Paper Structure

This paper contains 33 sections, 12 equations, 7 figures.

Figures (7)

  • Figure 1: Graphics illustrating the ACNO-MDP problem setting and ATM heuristic (Figures by Student)
  • Figure 2: Bag length $K = 1$. Arrows pointing to the actions are omitted. $A_t$: control action, $I_t$: query action, $R_t$: latent reward, $I_t R_t$: revealed reward, $C_t$: context, $M_t$: mediator, $E_t$: engagement, $O_{t-1}$: observation of $R_t$
  • Figure 3: Average scalarized return (SR) and number of measurements (M) in the measuring value environment ($c=0.05$), averaged over the final 200 episodes across 5 runs.
  • Figure 4: Scalarized return (SR) and number of measurements (M) for ATM-Q variants in FrozenLake variants ($c=0.05$), averaged over the final 200 episodes.
  • Figure 5: Performance of ATMQ vs Bayesian ATMQ across different evaluation settings.
  • ...and 2 more figures