Methodology for Interpretable Reinforcement Learning for Optimizing Mechanical Ventilation
Joo Seung Lee, Malini Mahendra, Anil Aswani
TL;DR
This paper addresses how to optimize mechanical ventilation using reinforcement learning while ensuring interpretability and clinical safety. It introduces an interpretable RL approach via Conservative Q-Improvement (CQI) and couples it with a matching-based, causal, nonparametric off-policy evaluation (OPE) framework to assess policies offline on real ICU data from MIMIC-III. Key contributions include a clinically-informed reward that balances SpO$_2$ gains against aggressive ventilator settings, a comparison against Behavior Cloning and CQL, and a demonstration that the interpretable CQI policy can achieve competitive SpO$_2$ improvements with reduced aggressive actions. The work highlights the potential of transparent, data-driven decision support for personalized ventilation within connected health systems, while noting the need for external validation and further safety-aware design before clinical deployment.
Abstract
Mechanical ventilation is a critical life support intervention that delivers controlled air and oxygen to a patient's lungs, assisting or replacing spontaneous breathing. While several data-driven approaches have been proposed to optimize ventilator control strategies, they often lack interpretability and alignment with domain knowledge, hindering clinical adoption. This paper presents a methodology for interpretable reinforcement learning (RL) aimed at improving mechanical ventilation control as part of connected health systems. Using a causal, nonparametric model-based off-policy evaluation, we assess RL policies for their ability to enhance patient-specific outcomes-specifically, increasing blood oxygen levels (SpO2), while avoiding aggressive ventilator settings that may cause ventilator-induced lung injuries and other complications. Through numerical experiments on real-world ICU data from the MIMIC-III database, we demonstrate that our interpretable decision tree policy achieves performance comparable to state-of-the-art deep RL methods while outperforming standard behavior cloning approaches. The results highlight the potential of interpretable, data-driven decision support systems to improve safety and efficiency in personalized ventilation strategies, paving the way for seamless integration into connected healthcare environments.
