A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial

Anna L. Trella; Kelly W. Zhang; Hinal Jajal; Inbal Nahum-Shani; Vivek Shetty; Finale Doshi-Velez; Susan A. Murphy

A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial

Anna L. Trella, Kelly W. Zhang, Hinal Jajal, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy

TL;DR

This work demonstrates the deployment of an online reinforcement learning algorithm within the Oralytics mHealth platform to optimize timely engagement prompts for oral self-care. By extending Thompson-Sampling contextual bandits with a full-pooling reward model and weekly Bayesian updates, the study emphasizes replicability and autonomy in a clinical-trial setting, including robust fallback mechanisms and an end-to-end software pipeline. A retrospective re-sampling analysis on trial data shows that the algorithm learned state-dependent advantages to prompting, though some evidence may be confounded by chance in certain states. The simulation-based evaluation validates the pooling approach and informs design decisions for a phase 2 trial planned for spring 2025, underscoring the practical impact of online RL for public-health interventions in real-world clinical workflows.

Abstract

Dental disease is a prevalent chronic condition associated with substantial financial burden, personal suffering, and increased risk of systemic diseases. Despite widespread recommendations for twice-daily tooth brushing, adherence to recommended oral self-care behaviors remains sub-optimal due to factors such as forgetfulness and disengagement. To address this, we developed Oralytics, a mHealth intervention system designed to complement clinician-delivered preventative care for marginalized individuals at risk for dental disease. Oralytics incorporates an online reinforcement learning algorithm to determine optimal times to deliver intervention prompts that encourage oral self-care behaviors. We have deployed Oralytics in a registered clinical trial. The deployment required careful design to manage challenges specific to the clinical trials setting in the U.S. In this paper, we (1) highlight key design decisions of the RL algorithm that address these challenges and (2) conduct a re-sampling analysis to evaluate algorithm design decisions. A second phase (randomized control trial) of Oralytics is planned to start in spring 2025.

A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial

TL;DR

Abstract

Paper Structure (41 sections, 10 equations, 7 figures, 7 tables)

This paper contains 41 sections, 10 equations, 7 figures, 7 tables.

Introduction
Design & Deployment Challenges in Clinical Trials
Contributions
Related Work
AI in Clinical Trials
Online RL Algorithms in mHealth
Preliminaries
Oralytics Clinical Trial
Online Reinforcement Learning
Oralytics RL Algorithm
Deploying Oralytics
Oralytics Pipeline
Software Components
End-to-End Pipeline Description
Design Decisions To Enhance Autonomy and Thus Replicability
...and 26 more sections

Figures (7)

Figure 1: The Oralytics mHealth intervention facilitates high-quality oral self-care behaviors (OSCB) through engagement prompts (e.g., encouraging individuals to monitor their brushing behavior and Q&A) via the Oralytics app.
Figure 2: Oralytics End-to-End Pipeline.
Figure 3: Fallback methods executed over the Oralytics trial. All 3 fallback methods were executed at least once during the Oralytics trial to mitigate various issues such as the RL service going down or failure to obtain sensor data from the main controller to form current state information.
Figure 4: The standardized predicted advantage in state $s$ over update times $\tau$ using posterior parameters learned during the Oralytics trial. It appears that the algorithm has learned a state where it is effective to send a prompt.
Figure 5: We compare the standardized predicted advantages across updates to the posterior parameters from the actual Oralytics trial (dark blue) with violin plots of predictive advantages using simulated posterior parameters (light blue) in an environment where there is truly no advantage in state $s$. Simulated posterior parameters were re-sampled across 500 Monte Carlo repetitions. The pattern in (a) and (b) suggests states where the algorithm learned an advantage of one action over the other and the re-sampling indicates this evidence is real. The pattern in (c), however, suggests a state where re-sampling indicates the appearance of learning likely occurred by random chance.
...and 2 more figures

A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial

TL;DR

Abstract

A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial

Authors

TL;DR

Abstract

Table of Contents

Figures (7)