Table of Contents
Fetching ...

Deep Reinforcement Learning for Sepsis Treatment

Aniruddh Raghu, Matthieu Komorowski, Imran Ahmed, Leo Celi, Peter Szolovits, Marzyeh Ghassemi

TL;DR

The paper tackles sepsis treatment by learning treatment policies from ICU data using continuous-state modeling and deep reinforcement learning. It employs a Duelling Double-Deep Q Network with a discretized action space for fluids and vasopressors, trained on MIMIC-III Sepsis-3 data, and uses SOFA and lactate-based rewards with a terminal survival signal. Qualitative analyses reveal policies that align with physician behavior, particularly in vasopressor use, and off-policy evaluation suggests potential improvements over clinician policies while acknowledging evaluation limitations. This work demonstrates the feasibility of continuous-state RL for ICU decision support and outlines directions for per-patient evaluation and model-based extensions.

Abstract

Sepsis is a leading cause of mortality in intensive care units and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. In this work, we propose an approach to deduce treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Our model learns clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. The learned policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival.

Deep Reinforcement Learning for Sepsis Treatment

TL;DR

The paper tackles sepsis treatment by learning treatment policies from ICU data using continuous-state modeling and deep reinforcement learning. It employs a Duelling Double-Deep Q Network with a discretized action space for fluids and vasopressors, trained on MIMIC-III Sepsis-3 data, and uses SOFA and lactate-based rewards with a terminal survival signal. Qualitative analyses reveal policies that align with physician behavior, particularly in vasopressor use, and off-policy evaluation suggests potential improvements over clinician policies while acknowledging evaluation limitations. This work demonstrates the feasibility of continuous-state RL for ICU decision support and outlines directions for per-patient evaluation and model-based extensions.

Abstract

Sepsis is a leading cause of mortality in intensive care units and costs hospitals billions annually. Treating a septic patient is highly challenging, because individual patients respond very differently to medical interventions and there is no universally agreed-upon treatment for sepsis. In this work, we propose an approach to deduce treatment policies for septic patients by using continuous state-space models and deep reinforcement learning. Our model learns clinically interpretable treatment policies, similar in important aspects to the treatment policies of physicians. The learned policies could be used to aid intensive care clinicians in medical decision making and improve the likelihood of patient survival.

Paper Structure

This paper contains 16 sections, 3 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Policies learned by the different models, as a 2D histogram, where we aggregate all actions selected by the physician and model on the test set over all relevant timesteps. The axes labels index the discretized action space, where 0 represents no drug given, and 4 the maximum of that particular drug. The model learn to prescribe vasopressors sparingly, a key feature of the physician's policy.
  • Figure 2: Comparison of how observed mortality (y-axis) varies with the difference between the dosages recommended by the optimal policy and the dosages administered by clinicians (x-axis) on a held-out test set. For every timestep, this difference was calculated and associated with whether the patient survived or died in the hospital, allowing the computation of observed mortality. We see low mortality with medium SOFA scores for when the difference is zero, indicating that when the physician acts according to the learned policy in this regime we observe more patient survival.