Dyadic Reinforcement Learning
Shuangning Li, Lluis Salvat Niell, Sung Won Choi, Inbal Nahum-Shani, Guy Shani, Susan Murphy
TL;DR
The paper tackles personalization of mobile-health interventions within dyads by introducing Dyadic RL, a two-tier hierarchical reinforcement learning framework that handles actions at different time scales and noisy, non-Markovian dynamics. The low-level policy uses randomized least-squares value iteration to optimize within time blocks, while the high-level policy employs Thompson sampling to select weekly actions, with a novel reward construction that denoises the high-level signal. A rigorous regret bound is proved, showing a sublinear rate of tilde{O}(H^3 S^{3/2} A^{1/2} |S^{high}|^{1/2} sqrt(KW)) under tabular assumptions, highlighting the benefit of hierarchical structure for learning efficiency. The authors validate Dyadic RL through toy simulations and a Roadmap 2.0–based simulation test bed, demonstrating robust performance against baselines and under varied delayed effects, with practical implications for implementing dyadic interventions in trials like ADAPTS HCT. The work advances interpretable, scalable dyadic interventions in mobile health by providing both theoretical guarantees and empirically grounded demonstration of real-world applicability.
Abstract
Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship between a target person and their care partner -- with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study.
