Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

Muhammad Hamza Yousuf; Jason Li; Sahar Vahdati; Raphael Theilen; Jakob Wittenstein; Jens Lehmann

Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

Muhammad Hamza Yousuf, Jason Li, Sahar Vahdati, Raphael Theilen, Jakob Wittenstein, Jens Lehmann

TL;DR

This study tackles safe optimization of invasive mechanical ventilation in the ICU using offline RL. It introduces a clinically grounded reward design that combines ventilator-free days with safe physiological ranges and uses multi-objective scalarization to balance competing objectives. To address the large and mixed action space, it implements constrained-discrete action optimization with a factored critic and extends SOTA offline RL methods to hybrid actions (IQL and EDAC), avoiding distribution shifts from bin-to-continuous reconstructions. The approach demonstrates improved safety and performance over clinician baselines, with robust cross-dataset generalization and favorable hyperparameter stability, and it emphasizes collaboration with clinicians for real-world deployability and future prospective validation.

Abstract

Invasive mechanical ventilation (MV) is a life-sustaining therapy commonly used in the intensive care unit (ICU) for patients with severe and acute conditions. These patients frequently rely on MV for breathing. Given the high risk of death in such cases, optimal MV settings can reduce mortality, minimize ventilator-induced lung injury, shorten ICU stays, and ease the strain on healthcare resources. However, optimizing MV settings remains a complex and error-prone process due to patient-specific variability. While Offline Reinforcement Learning (RL) shows promise for optimizing MV settings, current methods struggle with the hybrid (continuous and discrete) nature of MV settings. Discretizing continuous settings leads to exponential growth in the action space, which limits the number of optimizable settings. Converting the predictions back to continuous can cause a distribution shift, compromising safety and performance. To address this challenge, in the IntelliLung project, we are developing an AI-based approach where we constrain the action space and employ factored action critics. This approach allows us to scale to six optimizable settings compared to 2-3 in previous studies. We adapt SOTA offline RL algorithms to operate directly on hybrid action spaces, avoiding the pitfalls of discretization. We also introduce a clinically grounded reward function based on ventilator-free days and physiological targets. Using multiobjective optimization for reward selection, we show that this leads to a more equitable consideration of all clinically relevant objectives. Notably, we develop a system in close collaboration with healthcare professionals that is aligned with real-world clinical objectives and designed with future deployment in mind.

Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

TL;DR

Abstract

Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)