Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer

Dipesh Tamboli; Jiayu Chen; Kiran Pranesh Jotheeswaran; Denny Yu; Vaneet Aggarwal

Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer

Dipesh Tamboli, Jiayu Chen, Kiran Pranesh Jotheeswaran, Denny Yu, Vaneet Aggarwal

TL;DR

Sepsis treatment requires dynamic, patient-specific decisions that static guidelines struggle to provide. The authors introduce POSNEGDM, an offline reinforcement learning framework that combines a Transformer-based DualSight decision maker with a Mortality Classifier serving as a feedback reinforcer, trained on Sepsis MIMIC-III data with 4-hour window discretization. By explicitly leveraging positive and negative demonstrations through a three-term loss $L_{total}=\alpha L_{action}+\beta L_{state}+\gamma L_{survival}$, the system learns actions and predicted states that maximize patient survival. POSNEGDM achieves a striking reduction in mortality (2.61%) and high action-prediction fidelity (94.6%), significantly outperforming baselines such as Decision Transformer and Behavioral Cloning, with ablations confirming the critical roles of the transformer and mortality feedback. These results point to a clinically actionable, mortality-aware decision-support approach that could reduce ICU mortality and healthcare costs in sepsis care.

Abstract

Sepsis, a life-threatening condition triggered by the body's exaggerated response to infection, demands urgent intervention to prevent severe complications. Existing machine learning methods for managing sepsis struggle in offline scenarios, exhibiting suboptimal performance with survival rates below 50%. This paper introduces the POSNEGDM -- ``Reinforcement Learning with Positive and Negative Demonstrations for Sequential Decision-Making" framework utilizing an innovative transformer-based model and a feedback reinforcer to replicate expert actions while considering individual patient characteristics. A mortality classifier with 96.7\% accuracy guides treatment decisions towards positive outcomes. The POSNEGDM framework significantly improves patient survival, saving 97.39% of patients, outperforming established machine learning algorithms (Decision Transformer and Behavioral Cloning) with survival rates of 33.4% and 43.5%, respectively. Additionally, ablation studies underscore the critical role of the transformer-based decision maker and the integration of a mortality classifier in enhancing overall survival rates. In summary, our proposed approach presents a promising avenue for enhancing sepsis treatment outcomes, contributing to improved patient care and reduced healthcare costs.

Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer

TL;DR

, the system learns actions and predicted states that maximize patient survival. POSNEGDM achieves a striking reduction in mortality (2.61%) and high action-prediction fidelity (94.6%), significantly outperforming baselines such as Decision Transformer and Behavioral Cloning, with ablations confirming the critical roles of the transformer and mortality feedback. These results point to a clinically actionable, mortality-aware decision-support approach that could reduce ICU mortality and healthcare costs in sepsis care.

Abstract

Paper Structure (16 sections, 4 equations, 3 figures, 8 tables, 1 algorithm)

This paper contains 16 sections, 4 equations, 3 figures, 8 tables, 1 algorithm.

Introduction
Related Works
Offline Reinforcement Learning
Imitation Learning and Behavioral Cloning
Proposed Approach
Mortality Classifier
DualSight Decision Maker
The Overall Framework: PosNegDM
Sepsis Data Description
Experimental Results
Achieving Low Mortality with PosNegDM
Ablation studies
Conclusion
Enlarged Figure for Figure \ref{['fig:intro']}
Additional experiments
...and 1 more sections

Figures (3)

Figure 1: The DualSight decision maker takes in states, actions, and returns as input, which are first embedded into linear representations that are specific to each modality. The positional episodic timestep encoding is added to the input to help the model understand the order of events. The tokens are then fed into the GPT architecture, which uses a self-attention mechanism to predict actions and next states. The causal mask ensures that the model can only attend to previous tokens, preserving the causality of the system. The predicted states are subsequently input into the trained Mortality Classifier to assess whether the implemented action guides the patient towards a deceased state. The mortality prediction $m_t$ is employed to influence the DualSight , compelling it to choose actions aligned with the mortality classifier's prediction of an alive state. This process integrates mortality considerations into the decision-making mechanism, emphasizing the importance of actions that contribute to favorable patient outcomes.
Figure 2: The DualSight decision maker takes in states, actions, and returns as input, which are first embedded into linear representations that are specific to each modality. The positional episodic timestep encoding is added to the input to help the model understand the order of events. The tokens are then fed into the GPT architecture, which uses a self-attention mechanism to predict actions and next states. The causal mask ensures that the model can only attend to previous tokens, preserving the causality of the system. The predicted states are subsequently input into the trained Mortality Classifier to assess whether the implemented action guides the patient towards a deceased state. The mortality prediction $m_t$ is employed to influence the DualSight , compelling it to choose actions aligned with the mortality classifier's prediction of an alive state. This process integrates mortality considerations into the decision-making mechanism, emphasizing the importance of actions that contribute to favorable patient outcomes.
Figure 3: The three rows in the visualization represent the policies as provided by physicians, PosNegDM , and Behavioral Cloning (BC) respectively, each applied to both positive and negative test data. The axis labels correspond to the discretized action space, where '0' signifies no drug administration, and '4' indicates the maximum dosage of a particular drug. Each grid cell represents a specific action, with its color indicating the frequency of its occurrence.

Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer

TL;DR

Abstract

Reinforced Sequential Decision-Making for Sepsis Treatment: The POSNEGDM Framework with Mortality Classifier and Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (3)