Table of Contents
Fetching ...

Empowering Clinicians with Medical Decision Transformers: A Framework for Sepsis Treatment

Aamer Abdul Rahman, Pranav Agarwal, Rita Noumeir, Philippe Jouvet, Vincent Michalski, Samira Ebrahimi Kahou

TL;DR

Sepsis treatment is a high-stakes sequential decision problem with delayed and sparse feedback. The paper introduces MeDT, a transformer-based offline reinforcement learning framework conditioned on $r_T$ (hindsight reward) and $k_t$ (acuity-to-go) to generate personalized dosage recommendations and support clinician interaction. It integrates a state predictor for offline evaluation and employs multiple off-policy estimators, including $IS$, $WIS$, $FQE$, and $WDR$, to assess policy performance, while using an information-flow interpretability method to explain decisions. Experiments on the MIMIC-III septic cohort show MeDT is competitive with or outperforms existing offline baselines and provides interpretable, clinician-aligned rationales for dosing decisions. This approach advances practical, interpretable clinical decision support for sepsis management and holds promise for scaling to other medical decision problems with large offline datasets.

Abstract

Offline reinforcement learning has shown promise for solving tasks in safety-critical settings, such as clinical decision support. Its application, however, has been limited by the lack of interpretability and interactivity for clinicians. To address these challenges, we propose the medical decision transformer (MeDT), a novel and versatile framework based on the goal-conditioned reinforcement learning paradigm for sepsis treatment recommendation. MeDT uses the decision transformer architecture to learn a policy for drug dosage recommendation. During offline training, MeDT utilizes collected treatment trajectories to predict administered treatments for each time step, incorporating known treatment outcomes, target acuity scores, past treatment decisions, and current and past medical states. This analysis enables MeDT to capture complex dependencies among a patient's medical history, treatment decisions, outcomes, and short-term effects on stability. Our proposed conditioning uses acuity scores to address sparse reward issues and to facilitate clinician-model interactions, enhancing decision-making. Following training, MeDT can generate tailored treatment recommendations by conditioning on the desired positive outcome (survival) and user-specified short-term stability improvements. We carry out rigorous experiments on data from the MIMIC-III dataset and use off-policy evaluation to demonstrate that MeDT recommends interventions that outperform or are competitive with existing offline reinforcement learning methods while enabling a more interpretable, personalized and clinician-directed approach.

Empowering Clinicians with Medical Decision Transformers: A Framework for Sepsis Treatment

TL;DR

Sepsis treatment is a high-stakes sequential decision problem with delayed and sparse feedback. The paper introduces MeDT, a transformer-based offline reinforcement learning framework conditioned on (hindsight reward) and (acuity-to-go) to generate personalized dosage recommendations and support clinician interaction. It integrates a state predictor for offline evaluation and employs multiple off-policy estimators, including , , , and , to assess policy performance, while using an information-flow interpretability method to explain decisions. Experiments on the MIMIC-III septic cohort show MeDT is competitive with or outperforms existing offline baselines and provides interpretable, clinician-aligned rationales for dosing decisions. This approach advances practical, interpretable clinical decision support for sepsis management and holds promise for scaling to other medical decision problems with large offline datasets.

Abstract

Offline reinforcement learning has shown promise for solving tasks in safety-critical settings, such as clinical decision support. Its application, however, has been limited by the lack of interpretability and interactivity for clinicians. To address these challenges, we propose the medical decision transformer (MeDT), a novel and versatile framework based on the goal-conditioned reinforcement learning paradigm for sepsis treatment recommendation. MeDT uses the decision transformer architecture to learn a policy for drug dosage recommendation. During offline training, MeDT utilizes collected treatment trajectories to predict administered treatments for each time step, incorporating known treatment outcomes, target acuity scores, past treatment decisions, and current and past medical states. This analysis enables MeDT to capture complex dependencies among a patient's medical history, treatment decisions, outcomes, and short-term effects on stability. Our proposed conditioning uses acuity scores to address sparse reward issues and to facilitate clinician-model interactions, enhancing decision-making. Following training, MeDT can generate tailored treatment recommendations by conditioning on the desired positive outcome (survival) and user-specified short-term stability improvements. We carry out rigorous experiments on data from the MIMIC-III dataset and use off-policy evaluation to demonstrate that MeDT recommends interventions that outperform or are competitive with existing offline reinforcement learning methods while enabling a more interpretable, personalized and clinician-directed approach.
Paper Structure (25 sections, 9 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 9 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: MeDT training: At each time-step $t$, the MeDT policy attends to the past treatment trajectory. This includes the desired treatment outcome $r$ (at inference time fixed to $+1$ indicating survival), desired next-step acuity scores $k_1,\dots,k_t$ where $k_t = (kc_t, kr_t, kn_t, kl_t, kh_t, km_t, ko_t)$, patient states $s_1, \dots, s_t$, administered drug doses $a_1, \dots, a_{t-1}$, and outputs a dose prediction $\hat{a}_t$.
  • Figure 2: Autoregressive evaluation pipeline: At each time-step $t$, the pre-trained state predictor attends to past recommended doses $\hat{a}_1, \dots, \hat{a}_{t}$, the initial patient state $s_1$ and predicted patient states $\hat{s}_2, \dots, \hat{s}_t$, and outputs a prediction $\hat{s}_{t+1}$ of the patient state at time $t+1$. Both dosage recommendations $\hat{a}_{t+1}$ and predicted states are fed back to MeDT to simulate treatment trajectories with multiple sequential decisions.
  • Figure 3: (a) Dosage recommended by and clinician policy for different SAPS2 scores. (b) Distribution of fluids and given by the and clinician policies.
  • Figure 4: Box-plots of FQE, WIS and WDR off-policy evaluations for MeDT and baselines.
  • Figure 5: Visualization of 4 patient trajectories computed by the state predictor following treatment recommendation from (red) and (blue).
  • ...and 4 more figures