DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

Zhiyao Luo; Mingcheng Zhu; Fenglin Liu; Jiali Li; Yangchen Pan; Jiandong Zhou; Tingting Zhu

DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

Zhiyao Luo, Mingcheng Zhu, Fenglin Liu, Jiali Li, Yangchen Pan, Jiandong Zhou, Tingting Zhu

TL;DR

This work addresses the lack of standardized in silico benchmarks for reinforcement learning (RL) applied to dynamic treatment regimes (DTRs) in healthcare. It introduces DTR-Bench, a modular platform with four simulation environments—AhnChemoEnv, GhaffariCancerEnv, OberstSepsisEnv, and SimGlucoseEnv—designed to incorporate realistic factors such as PK/PD variability, observation noise, hidden variables, and missing data. The authors benchmark a range of RL algorithms spanning discrete-action (e.g., DQN, DDQN, C51) and continuous-action (e.g., DDPG, TD3, SAC) families using a standardized learning and evaluation workflow with automated hyperparameter tuning. Results show that real-world complexities degrade performance, no single algorithm dominates across all tasks, and methods like C51 and drSAC demonstrate robustness to noise and missing data, while temporal history via RNNs is not universally advantageous. The open-source DTR-Bench platform enables rigorous, reproducible evaluation of RL-based DTRs and highlights the need for more robust, adaptive algorithms and more realistic medical simulations to bridge the gap to clinical impact.

Abstract

Reinforcement learning (RL) has garnered increasing recognition for its potential to optimise dynamic treatment regimes (DTRs) in personalised medicine, particularly for drug dosage prescriptions and medication recommendations. However, a significant challenge persists: the absence of a unified framework for simulating diverse healthcare scenarios and a comprehensive analysis to benchmark the effectiveness of RL algorithms within these contexts. To address this gap, we introduce \textit{DTR-Bench}, a benchmarking platform comprising four distinct simulation environments tailored to common DTR applications, including cancer chemotherapy, radiotherapy, glucose management in diabetes, and sepsis treatment. We evaluate various state-of-the-art RL algorithms across these settings, particularly highlighting their performance amidst real-world challenges such as pharmacokinetic/pharmacodynamic (PK/PD) variability, noise, and missing data. Our experiments reveal varying degrees of performance degradation among RL algorithms in the presence of noise and patient variability, with some algorithms failing to converge. Additionally, we observe that using temporal observation representations does not consistently lead to improved performance in DTR settings. Our findings underscore the necessity of developing robust, adaptive RL algorithms capable of effectively managing these complexities to enhance patient-specific healthcare. We have open-sourced our benchmark and code at https://github.com/GilesLuo/DTR-Bench.

DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

TL;DR

Abstract

Paper Structure (24 sections, 18 equations, 2 figures, 12 tables)

This paper contains 24 sections, 18 equations, 2 figures, 12 tables.

Introduction
Related Work
RL-based Dynamic Treatment Regime and Benchmarks
Benchmarks for RL algorithms
On-policy and Off-policy RL
Algorithms for Discrete Action Spaces
Algorithms for Continuous Action Spaces
Methods
Problem Formulation
Enhancement towards more realistic DTR Simulation
Learning and Evaluation Workflow
Experiment Settings
AhnChemoEnv: A Comprehensive Chemotherapy Simulation Model
GhaffariCancerEnv: A Mixed Radiotherapy and Chemotherapy Model
OberstSepsisEnv: A Sepsis Simulator
...and 9 more sections

Figures (2)

Figure 1: Workflow of the DTR-Bench platform. The platform streamlines 3 steps of DTR-bench. Step 1 Learning: RL algorithms interact with the environment, capturing interaction trajectories in a buffer for efficient off-policy learning. Hyperparameters are tuned using Tree-structured Parzen Estimator (TPE) optimisationozaki2020multiobjective. Step 2: Evaluation: Models are retrained with the optimised hyperparameters using five distinct seeds, with each model undergoing testing across 5,000 episodes to ensure a fair assessment. Step 3 Visualisation: The platform facilitates individual and cohort-averaged trajectory visualisations, supporting intuitive model development and analysis.
Figure 2: A summary of RL algorithms and environments in the DTR-Bench platform.a) DTR-Bench benchmarks discrete-action, continuous-action, and sequential RL algorithms—across four critical healthcare challenges: chemotherapy, radiotherapy, sepsis, and Type-1 diabetes management. The platform evaluates these algorithms rigorously by considering complex factors such as PK/PD modelling, missing values, and noise in observations and states. b)A radar plot showing the environment configurations, where $n_{\text{cf}}$ means the counterfactual variables, $n_{\text{cf, reward}}$ and $n_{\text{cf, dynamics}}$ mean the counterfactual variables affecting the reward function and patient PK/PD dynamics, respectively. $n_{\text{cf}}$, $n_{\text{cf, reward}}$ and $n_{\text{cf, dynamics}}$ are with Logarithm

DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

TL;DR

Abstract

DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime

Authors

TL;DR

Abstract

Table of Contents

Figures (2)