A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial
Anna L. Trella, Kelly W. Zhang, Hinal Jajal, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy
TL;DR
This work demonstrates the deployment of an online reinforcement learning algorithm within the Oralytics mHealth platform to optimize timely engagement prompts for oral self-care. By extending Thompson-Sampling contextual bandits with a full-pooling reward model and weekly Bayesian updates, the study emphasizes replicability and autonomy in a clinical-trial setting, including robust fallback mechanisms and an end-to-end software pipeline. A retrospective re-sampling analysis on trial data shows that the algorithm learned state-dependent advantages to prompting, though some evidence may be confounded by chance in certain states. The simulation-based evaluation validates the pooling approach and informs design decisions for a phase 2 trial planned for spring 2025, underscoring the practical impact of online RL for public-health interventions in real-world clinical workflows.
Abstract
Dental disease is a prevalent chronic condition associated with substantial financial burden, personal suffering, and increased risk of systemic diseases. Despite widespread recommendations for twice-daily tooth brushing, adherence to recommended oral self-care behaviors remains sub-optimal due to factors such as forgetfulness and disengagement. To address this, we developed Oralytics, a mHealth intervention system designed to complement clinician-delivered preventative care for marginalized individuals at risk for dental disease. Oralytics incorporates an online reinforcement learning algorithm to determine optimal times to deliver intervention prompts that encourage oral self-care behaviors. We have deployed Oralytics in a registered clinical trial. The deployment required careful design to manage challenges specific to the clinical trials setting in the U.S. In this paper, we (1) highlight key design decisions of the RL algorithm that address these challenges and (2) conduct a re-sampling analysis to evaluate algorithm design decisions. A second phase (randomized control trial) of Oralytics is planned to start in spring 2025.
