Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation

Joo Seung Lee; Malini Mahendra; Anil Aswani

Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation

Joo Seung Lee, Malini Mahendra, Anil Aswani

TL;DR

This paper addresses how to optimize mechanical ventilation using reinforcement learning while ensuring interpretability and clinical safety. It introduces an interpretable RL approach via Conservative Q-Improvement (CQI) and couples it with a matching-based, causal, nonparametric off-policy evaluation (OPE) framework to assess policies offline on real ICU data from MIMIC-III. Key contributions include a clinically-informed reward that balances SpO$_2$ gains against aggressive ventilator settings, a comparison against Behavior Cloning and CQL, and a demonstration that the interpretable CQI policy can achieve competitive SpO$_2$ improvements with reduced aggressive actions. The work highlights the potential of transparent, data-driven decision support for personalized ventilation within connected health systems, while noting the need for external validation and further safety-aware design before clinical deployment.

Abstract

Mechanical ventilation is a critical life support intervention that delivers controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and alignment with domain knowledge, hindering clinical adoption. This paper presents a methodology for interpretable reinforcement learning (RL) aimed at improving mechanical ventilation control as part of connected health systems. Using a causal, nonparametric model-based off-policy evaluation, we assess RL policies for their ability to enhance patient-specific outcomes-specifically, increasing blood oxygen levels (SpO2), while avoiding aggressive ventilator settings that may cause ventilator-induced lung injuries and other complications. Through numerical experiments on real-world ICU data from the MIMIC-III database, we demonstrate that our interpretable decision tree policy achieves performance comparable to state-of-the-art deep RL methods while outperforming standard behavior cloning approaches. The results highlight the potential of interpretable, data-driven decision support systems to improve safety and efficiency in personalized ventilation strategies, paving the way for seamless integration into connected healthcare environments.

Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation

TL;DR

gains against aggressive ventilator settings, a comparison against Behavior Cloning and CQL, and a demonstration that the interpretable CQI policy can achieve competitive SpO

improvements with reduced aggressive actions. The work highlights the potential of transparent, data-driven decision support for personalized ventilation within connected health systems, while noting the need for external validation and further safety-aware design before clinical deployment.

Abstract

Paper Structure (35 sections, 7 equations, 10 figures, 1 algorithm)

This paper contains 35 sections, 7 equations, 10 figures, 1 algorithm.

Introduction
Reinforcement Learning for Dynamic Treatment
Reinforcement Learning for Mechanical Ventilation
Contributions and Outline
Data
Inclusion Criteria
Preprocessing Steps
Data Variables
Model
Markov Decision Process (MDP) Definition
RL Problem Definition
State Space
Action Space
Reward
Policy Learning Scheme
...and 20 more sections

Figures (10)

Figure 1: Overview of the pipeline for mechanical ventilation policy learning and evaluation.
Figure 2: The Nadaraya-Watson estimator equation for modeling patient state transitions.
Figure 3: Overview of the matching-based off-policy evaluation framework.
Figure 4: Sample trajectory using Nadaraya-Watson estimator transition model. This particular patient is a 50-years old female without prior ICU admission and survived for at least 90 days.
Figure 5: Decision tree policy with maximum depth of 3, learned from the clinicians' behavioral data.
...and 5 more figures

Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation

TL;DR

Abstract

Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)