Realistic CDSS Drug Dosing with End-to-end Recurrent Q-learning for Dual Vasopressor Control
Will Y. Zou, Jean Feng, Alexandre Kalimouttou, Jennifer Yuntong Zhang, Christopher W. Seymour, Romain Pirracchio
TL;DR
This paper tackles the challenge of learning realistic dosing policies for septic shock in ICUs using offline reinforcement learning. It introduces a dual vasopressor control framework with an end-to-end Q-learning approach, exploring multiple action-space designs for norepinephrine ($vp_2$) and vasopressin ($vp_1$), and incorporating conservative Q-learning alongside recurrent experience replay with an LSTM. Experiments on eICU and MIMIC-IV demonstrate that clinically aligned action spaces, particularly block discrete and stepwise directional formulations, improve learnability and policy performance, while Fitted Q-Evaluation and Weighted Importance Sampling reveal substantial offline policy gains over clinician baselines. The results establish a blueprint for deployable CDSS that balances optimization, interpretability, and safety, potentially accelerating adoption of RL-driven dosing in critical care. Key mathematical elements include the action space definitions with $vp_1$ binary and $vp_2$ in $(0,0.5]$, the offline RL objective with TD errors, and the use of $WIS$ to evaluate policies against clinical practice.
Abstract
Reinforcement learning (RL) applications in Clinical Decision Support Systems (CDSS) frequently encounter skepticism because models may recommend inoperable dosing decisions. We propose an end-to-end offline RL framework for dual vasopressor administration in Intensive Care Units (ICUs) that directly addresses this challenge through principled action space design. Our method integrates discrete, continuous, and directional dosing strategies with conservative Q-learning and incorporates a novel recurrent modeling using a replay buffer to capture temporal dependencies in ICU time-series data. Our comparative analysis of norepinephrine dosing strategies across different action space formulations reveals that the designed action spaces improve interpretability and facilitate clinical adoption while preserving efficacy. Empirical results on eICU and MIMIC demonstrate that action space design profoundly influences learned behavioral policies. Compared with baselines, the proposed methods achieve more than 3x expected reward improvements, while aligning with established clinical protocols.
