Table of Contents
Fetching ...

Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification

Hang Yu, Huidong Liu, Qingchen Zhang, William Joy, Kateryna Nikulina, Andreas A. Schuppert, Sina Saffaran, Declan Bates

Abstract

Mechanical ventilation (MV) is a life-saving intervention for patients with acute respiratory failure (ARF) in the ICU. However, inappropriate ventilator settings could cause ventilator-induced lung injury (VILI). Also, clinicians workload is shown to be directly linked to patient outcomes. Hence, MV should be personalized and automated to improve patient outcomes. Previous attempts to incorporate personalization and automation in MV include traditional supervised learning and offline reinforcement learning (RL) approaches, which often neglect temporal dependencies and rely excessively on mortality-based rewards. As a result, early stage physiological deterioration and the risk of VILI are not adequately captured. To address these limitations, we propose Transformer-based Conservative Q-Learning (T-CQL), a novel offline RL framework that integrates a Transformer encoder for effective temporal modeling of patient dynamics, conservative adaptive regularization based on uncertainty quantification to ensure safety, and consistency regularization for robust decision-making. We build a clinically informed reward function that incorporates indicators of VILI and a score for severity of patients illness. Also, previous work predominantly uses Fitted Q-Evaluation (FQE) for RL policy evaluation on static offline data, which is less responsive to dynamic environmental changes and susceptible to distribution shifts. To overcome these evaluation limitations, interactive digital twins of ARF patients were used for online "at the bedside" evaluation. Our results demonstrate that T-CQL consistently outperforms existing state-of-the-art offline RL methodologies, providing safer and more effective ventilatory adjustments. Our framework demonstrates the potential of Transformer-based models combined with conservative RL strategies as a decision support tool in critical care.

Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification

Abstract

Mechanical ventilation (MV) is a life-saving intervention for patients with acute respiratory failure (ARF) in the ICU. However, inappropriate ventilator settings could cause ventilator-induced lung injury (VILI). Also, clinicians workload is shown to be directly linked to patient outcomes. Hence, MV should be personalized and automated to improve patient outcomes. Previous attempts to incorporate personalization and automation in MV include traditional supervised learning and offline reinforcement learning (RL) approaches, which often neglect temporal dependencies and rely excessively on mortality-based rewards. As a result, early stage physiological deterioration and the risk of VILI are not adequately captured. To address these limitations, we propose Transformer-based Conservative Q-Learning (T-CQL), a novel offline RL framework that integrates a Transformer encoder for effective temporal modeling of patient dynamics, conservative adaptive regularization based on uncertainty quantification to ensure safety, and consistency regularization for robust decision-making. We build a clinically informed reward function that incorporates indicators of VILI and a score for severity of patients illness. Also, previous work predominantly uses Fitted Q-Evaluation (FQE) for RL policy evaluation on static offline data, which is less responsive to dynamic environmental changes and susceptible to distribution shifts. To overcome these evaluation limitations, interactive digital twins of ARF patients were used for online "at the bedside" evaluation. Our results demonstrate that T-CQL consistently outperforms existing state-of-the-art offline RL methodologies, providing safer and more effective ventilatory adjustments. Our framework demonstrates the potential of Transformer-based models combined with conservative RL strategies as a decision support tool in critical care.
Paper Structure (26 sections, 14 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 14 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: A simplified, diagrammatic representation of the digital twin cardiopulmonary system.
  • Figure 2: Overview of the proposed method. (A) Preparation of the structured offline dataset. (B) The training procedure of our proposed T-CQL method, which integrates a Transformer encoder for effective temporal modeling of patient dynamics, state-dependent conservative adaptive regularization based on uncertainty quantification to ensure safety, and consistency regularization for robust decision-making. (C) Evaluation methods: off-policy FQE and online interactive digital twin evaluation.
  • Figure 3: Comparison of average initial Q-values for in-distribution (ID) versus out-of-distribution (OOD) cases. Error bars denote the variance, and the horizontal line marks the maximum achievable expected return.
  • Figure 4: Distribution of ventilator settings chosen by different policies on the MIMIC test set. "Clinicians" refer to the actions taken by clinicians in the original MIMIC dataset.
  • Figure 5: Distribution of ventilator settings selected by different policies across digital twins of real patients. "Physician" refers to clinician-derived actions from the MIMIC dataset, while "RWTH Intensivists" represent ventilator settings applied by clinicians at the University Hospital of Aachen.