DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime
Zhiyao Luo, Mingcheng Zhu, Fenglin Liu, Jiali Li, Yangchen Pan, Jiandong Zhou, Tingting Zhu
TL;DR
This work addresses the lack of standardized in silico benchmarks for reinforcement learning (RL) applied to dynamic treatment regimes (DTRs) in healthcare. It introduces DTR-Bench, a modular platform with four simulation environments—AhnChemoEnv, GhaffariCancerEnv, OberstSepsisEnv, and SimGlucoseEnv—designed to incorporate realistic factors such as PK/PD variability, observation noise, hidden variables, and missing data. The authors benchmark a range of RL algorithms spanning discrete-action (e.g., DQN, DDQN, C51) and continuous-action (e.g., DDPG, TD3, SAC) families using a standardized learning and evaluation workflow with automated hyperparameter tuning. Results show that real-world complexities degrade performance, no single algorithm dominates across all tasks, and methods like C51 and drSAC demonstrate robustness to noise and missing data, while temporal history via RNNs is not universally advantageous. The open-source DTR-Bench platform enables rigorous, reproducible evaluation of RL-based DTRs and highlights the need for more robust, adaptive algorithms and more realistic medical simulations to bridge the gap to clinical impact.
Abstract
Reinforcement learning (RL) has garnered increasing recognition for its potential to optimise dynamic treatment regimes (DTRs) in personalised medicine, particularly for drug dosage prescriptions and medication recommendations. However, a significant challenge persists: the absence of a unified framework for simulating diverse healthcare scenarios and a comprehensive analysis to benchmark the effectiveness of RL algorithms within these contexts. To address this gap, we introduce \textit{DTR-Bench}, a benchmarking platform comprising four distinct simulation environments tailored to common DTR applications, including cancer chemotherapy, radiotherapy, glucose management in diabetes, and sepsis treatment. We evaluate various state-of-the-art RL algorithms across these settings, particularly highlighting their performance amidst real-world challenges such as pharmacokinetic/pharmacodynamic (PK/PD) variability, noise, and missing data. Our experiments reveal varying degrees of performance degradation among RL algorithms in the presence of noise and patient variability, with some algorithms failing to converge. Additionally, we observe that using temporal observation representations does not consistently lead to improved performance in DTR settings. Our findings underscore the necessity of developing robust, adaptive RL algorithms capable of effectively managing these complexities to enhance patient-specific healthcare. We have open-sourced our benchmark and code at https://github.com/GilesLuo/DTR-Bench.
