Table of Contents
Fetching ...

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

Unnseo Park, Venkatesh Sivaraman, Adam Perer

TL;DR

The paper investigates whether clinician actions are diverse enough to influence sepsis progression and whether action information improves offline predictive models. Using transformer-based dynamics models trained on MIMIC-IV and eICU data, they assess predictions of future disease severity with and without future action inputs. They find that incorporating action information does not materially improve model fit, suggesting limited action-diversity signals in this dataset and that action-prediction analyses show some predictability but not enough to explain outcome differences. The work highlights the need for richer, clinically-informed action representations and diverse data sources to enable reliable RL-based sepsis optimization.

Abstract

Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment.

How Consistent are Clinicians? Evaluating the Predictability of Sepsis Disease Progression with Dynamics Models

TL;DR

The paper investigates whether clinician actions are diverse enough to influence sepsis progression and whether action information improves offline predictive models. Using transformer-based dynamics models trained on MIMIC-IV and eICU data, they assess predictions of future disease severity with and without future action inputs. They find that incorporating action information does not materially improve model fit, suggesting limited action-diversity signals in this dataset and that action-prediction analyses show some predictability but not enough to explain outcome differences. The work highlights the need for richer, clinically-informed action representations and diverse data sources to enable reliable RL-based sepsis optimization.

Abstract

Reinforcement learning (RL) is a promising approach to generate treatment policies for sepsis patients in intensive care. While retrospective evaluation metrics show decreased mortality when these policies are followed, studies with clinicians suggest their recommendations are often spurious. We propose that these shortcomings may be due to lack of diversity in observed actions and outcomes in the training data, and we construct experiments to investigate the feasibility of predicting sepsis disease severity changes due to clinician actions. Preliminary results suggest incorporating action information does not significantly improve model performance, indicating that clinician actions may not be sufficiently variable to yield measurable effects on disease progression. We discuss the implications of these findings for optimizing sepsis treatment.
Paper Structure (11 sections, 3 figures)

This paper contains 11 sections, 3 figures.

Figures (3)

  • Figure 1: Markov decision process model for patients with sepsis in the ICU. $s_t$ represents the patient state at time $t$, $a_t$ represents a treatment action, and $y_t$ represents a function of the state that captures the patient's disease severity. Brackets indicate how these values are used in our experiment.
  • Figure 2: Left: RMSE (lower is better) of the predicted change in disease severity across training schemes ("Train Actions", "Train States," and "Train States + Actions") and action inputs at test time (True, Zero, Shuffled, and Mean). Error bars indicate the standard deviation across three random weight initializations. Note that all units are in $z$-scaled space, so an RMSE of 1 corresponds to 1 standard deviation in the severity metric. Right: example histograms comparing true and predicted changes in SOFA score at 12 hours ahead, in the True and Shuffled evaluation conditions.
  • Figure 3: Left: correlations between true and predicted normalized actions from 1 to 6 hours ahead. Right: example histograms of correlations between true and predicted normalized actions at 6 hours.