Table of Contents
Fetching ...

Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

Muhammad Hamza Yousuf, Jason Li, Sahar Vahdati, Raphael Theilen, Jakob Wittenstein, Jens Lehmann

TL;DR

This study tackles safe optimization of invasive mechanical ventilation in the ICU using offline RL. It introduces a clinically grounded reward design that combines ventilator-free days with safe physiological ranges and uses multi-objective scalarization to balance competing objectives. To address the large and mixed action space, it implements constrained-discrete action optimization with a factored critic and extends SOTA offline RL methods to hybrid actions (IQL and EDAC), avoiding distribution shifts from bin-to-continuous reconstructions. The approach demonstrates improved safety and performance over clinician baselines, with robust cross-dataset generalization and favorable hyperparameter stability, and it emphasizes collaboration with clinicians for real-world deployability and future prospective validation.

Abstract

Invasive mechanical ventilation (MV) is a life-sustaining therapy commonly used in the intensive care unit (ICU) for patients with severe and acute conditions. These patients frequently rely on MV for breathing. Given the high risk of death in such cases, optimal MV settings can reduce mortality, minimize ventilator-induced lung injury, shorten ICU stays, and ease the strain on healthcare resources. However, optimizing MV settings remains a complex and error-prone process due to patient-specific variability. While Offline Reinforcement Learning (RL) shows promise for optimizing MV settings, current methods struggle with the hybrid (continuous and discrete) nature of MV settings. Discretizing continuous settings leads to exponential growth in the action space, which limits the number of optimizable settings. Converting the predictions back to continuous can cause a distribution shift, compromising safety and performance. To address this challenge, in the IntelliLung project, we are developing an AI-based approach where we constrain the action space and employ factored action critics. This approach allows us to scale to six optimizable settings compared to 2-3 in previous studies. We adapt SOTA offline RL algorithms to operate directly on hybrid action spaces, avoiding the pitfalls of discretization. We also introduce a clinically grounded reward function based on ventilator-free days and physiological targets. Using multiobjective optimization for reward selection, we show that this leads to a more equitable consideration of all clinically relevant objectives. Notably, we develop a system in close collaboration with healthcare professionals that is aligned with real-world clinical objectives and designed with future deployment in mind.

Advancing Safe Mechanical Ventilation Using Offline RL With Hybrid Actions and Clinically Aligned Rewards

TL;DR

This study tackles safe optimization of invasive mechanical ventilation in the ICU using offline RL. It introduces a clinically grounded reward design that combines ventilator-free days with safe physiological ranges and uses multi-objective scalarization to balance competing objectives. To address the large and mixed action space, it implements constrained-discrete action optimization with a factored critic and extends SOTA offline RL methods to hybrid actions (IQL and EDAC), avoiding distribution shifts from bin-to-continuous reconstructions. The approach demonstrates improved safety and performance over clinician baselines, with robust cross-dataset generalization and favorable hyperparameter stability, and it emphasizes collaboration with clinicians for real-world deployability and future prospective validation.

Abstract

Invasive mechanical ventilation (MV) is a life-sustaining therapy commonly used in the intensive care unit (ICU) for patients with severe and acute conditions. These patients frequently rely on MV for breathing. Given the high risk of death in such cases, optimal MV settings can reduce mortality, minimize ventilator-induced lung injury, shorten ICU stays, and ease the strain on healthcare resources. However, optimizing MV settings remains a complex and error-prone process due to patient-specific variability. While Offline Reinforcement Learning (RL) shows promise for optimizing MV settings, current methods struggle with the hybrid (continuous and discrete) nature of MV settings. Discretizing continuous settings leads to exponential growth in the action space, which limits the number of optimizable settings. Converting the predictions back to continuous can cause a distribution shift, compromising safety and performance. To address this challenge, in the IntelliLung project, we are developing an AI-based approach where we constrain the action space and employ factored action critics. This approach allows us to scale to six optimizable settings compared to 2-3 in previous studies. We adapt SOTA offline RL algorithms to operate directly on hybrid action spaces, avoiding the pitfalls of discretization. We also introduce a clinically grounded reward function based on ventilator-free days and physiological targets. Using multiobjective optimization for reward selection, we show that this leads to a more equitable consideration of all clinically relevant objectives. Notably, we develop a system in close collaboration with healthcare professionals that is aligned with real-world clinical objectives and designed with future deployment in mind.

Paper Structure

This paper contains 61 sections, 5 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Panels (a)–(b) report the correlations (higher is better) Corr@VFD and Corr@RangeReward, respectively, across reward scales ($w_{\mathrm{vfd}}$ for VFD and $w_{\mathrm{morta}}$ for Mortality). Panel (c) shows the Tchebycheff value (lower is better) over the same scales. For every (reward, scale) pair, we trained five independent HybridIQL policies and report the mean of their final values.
  • Figure 2: Distribution of policy coverage $d^\pi$ (across states in test set) for each algorithm. The red dashed line represents the OOD threshold, defined as lower Tukey fence (at least 75% samples lie above it) of $d^\pi$ distribution under the clinician policy. We classify actions with $d^\pi$ below threshold as OOD.
  • Figure 3: Action distribution of FiO$_2$ for hybrid-action setup vs clinician.
  • Figure 4: Action distribution of FiO$_2$ for discrete-action setup vs clinician.
  • Figure 5: Hybrid-Action setup action distributions vs Clinician
  • ...and 3 more figures