Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

Anna L. Trella; Kelly W. Zhang; Inbal Nahum-Shani; Vivek Shetty; Iris Yan; Finale Doshi-Velez; Susan A. Murphy

Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

Anna L. Trella, Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Iris Yan, Finale Doshi-Velez, Susan A. Murphy

TL;DR

Online RL in clinical trials offers targeted personalization but raises data-quality and participant-safety concerns. The authors define algorithm fidelity and present a two-phase framework—pre-trial planning and real-time monitoring—illustrated on Oralytics, which employs a generalized contextual bandit with Thompson sampling using per-step states $S_{i,t}$, actions $A_{i,t}$, and rewards $R_{i,t}$. The contributions include a formal fidelity concept, a concrete planning+monitoring framework with a red/yellow/green severity taxonomy, and actionable lessons from the Oralytics deployment since Spring 2023. The work provides practical guidance for safely translating rapid RL advances into real-world clinical trials while preserving the integrity of post-trial analyses.

Abstract

Online reinforcement learning (RL) algorithms offer great potential for personalizing treatment for participants in clinical trials. However, deploying an online, autonomous algorithm in the high-stakes healthcare setting makes quality control and data quality especially difficult to achieve. This paper proposes algorithm fidelity as a critical requirement for deploying online RL algorithms in clinical trials. It emphasizes the responsibility of the algorithm to (1) safeguard participants and (2) preserve the scientific utility of the data for post-trial analyses. We also present a framework for pre-deployment planning and real-time monitoring to help algorithm developers and clinical researchers ensure algorithm fidelity. To illustrate our framework's practical application, we present real-world examples from the Oralytics clinical trial. Since Spring 2023, this trial successfully deployed an autonomous, online RL algorithm to personalize behavioral interventions for participants at risk for dental disease.

Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

TL;DR

, actions

, and rewards

. The contributions include a formal fidelity concept, a concrete planning+monitoring framework with a red/yellow/green severity taxonomy, and actionable lessons from the Oralytics deployment since Spring 2023. The work provides practical guidance for safely translating rapid RL advances into real-world clinical trials while preserving the integrity of post-trial analyses.

Abstract

Paper Structure (27 sections, 5 equations, 1 figure)

This paper contains 27 sections, 5 equations, 1 figure.

Introduction
Running Example: Supporting Oral Health
Defining the RL Problem
Oralytics System Components
Defining Algorithm Fidelity
Related Work
Quality Control
Data Quality
A Pragmatic Framework for Ensuring Algorithm Fidelity
Planning Phase
Implementing Fallback Methods (Quality Control)
Setting Restrictions (Quality Control and Data Quality)
Collecting Critical Data (Data Quality)
Real-Time Monitoring Phase
Red Severity (Compromises Algorithm Fidelity)
...and 12 more sections

Figures (1)

Figure 1: Oralytics System and RL System Architecture. Brushing data is captured by sensors in the toothbrush and uploaded to the commercial cloud via a dock. The main controller gathers this data, along with app engagement data from the Oralytics app, and feeds it to the RL service and the dashboards. This sensor data is provided to the RL service to select actions and update. Using the actions selected by the RL service, the main controller populates intervention prompt content, and schedules prompts onto each participant's Oralytics app.

Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

TL;DR

Abstract

Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

Authors

TL;DR

Abstract

Table of Contents

Figures (1)