Oralytics Reinforcement Learning Algorithm
Anna L. Trella, Kelly W. Zhang, Stephanie M. Carpenter, David Elashoff, Zara M. Greer, Inbal Nahum-Shani, Dennis Ruenger, Vivek Shetty, Susan A. Murphy
TL;DR
This work presents Oralytics, an online Bayesian contextual bandit designed to personalize engagement prompts to improve oral self-care behaviors. It combines a Bayesian linear regression reward model with action centering, a fully pooled (cross-participant) learning approach, and a carefully constructed prior from pilot data, all learned and updated weekly in a clinical trial setting. The authors address practical challenges such as app-opening issues through a modified RL pipeline, simulated environments based on ROBAS data, and a monitoring system to ensure data integrity. Through extensive simulation experiments across stationary/non-stationary environments and varying participant responsivity, they determine final design decisions (full pooling, weekly updates, a specific smoothing slope, and tuned cost terms) and demonstrate how the surrogate rewards incorporating delayed effects guide learning while mitigating potential over-prompting burdens. The work advances scalable, data-efficient personalization for digital health interventions with methods that support robust after-study causal inference and real-world deployment.
Abstract
Dental disease is still one of the most common chronic diseases in the United States. While dental disease is preventable through healthy oral self-care behaviors (OSCB), this basic behavior is not consistently practiced. We have developed Oralytics, an online, reinforcement learning (RL) algorithm that optimizes the delivery of personalized intervention prompts to improve OSCB. In this paper, we offer a full overview of algorithm design decisions made using prior data, domain expertise, and experiments in a simulation test bed. The finalized RL algorithm was deployed in the Oralytics clinical trial, conducted from fall 2023 to summer 2024.
