Table of Contents
Fetching ...

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

Mohamed Khalil Ben Salah, Philippe Jouvet, Rita Noumeir

TL;DR

This work tackles non-contact vital-sign monitoring in Pediatric Intensive Care Units by developing a self-supervised rPPG framework tailored to clinical challenges. It integrates a VisionMamba-based student, a fixed physiological teacher (PhysMamba), and a novel Adaptive Masking Network that learns to occlude informative patches via policy gradient, guided by a curriculum that progresses from clean public data to real PICU videos. The approach combines masked reconstruction with physiological distillation, achieving a final MAE of $3.2$ bpm and a Pearson correlation of $R=0.91$, while demonstrating strong robustness to occlusions and domain shifts. The method promises practical impact by enabling continuous, low-risk, contactless heart-rate monitoring in the PICU with real-time efficiency and without explicit ROI annotation.

Abstract

Continuous monitoring of vital signs in Pediatric Intensive Care Units (PICUs) is essential for early detection of clinical deterioration and effective clinical decision-making. However, contact-based sensors such as pulse oximeters may cause skin irritation, increase infection risk, and lead to patient discomfort. Remote photoplethysmography (rPPG) offers a contactless alternative to monitor heart rate using facial video, but remains underutilized in PICUs due to motion artifacts, occlusions, variable lighting, and domain shifts between laboratory and clinical data. We introduce a self-supervised pretraining framework for rPPG estimation in the PICU setting, based on a progressive curriculum strategy. The approach leverages the VisionMamba architecture and integrates an adaptive masking mechanism, where a lightweight Mamba-based controller assigns spatiotemporal importance scores to guide probabilistic patch sampling. This strategy dynamically increases reconstruction difficulty while preserving physiological relevance. To address the lack of labeled clinical data, we adopt a teacher-student distillation setup. A supervised expert model, trained on public datasets, provides latent physiological guidance to the student. The curriculum progresses through three stages: clean public videos, synthetic occlusion scenarios, and unlabeled videos from 500 pediatric patients. Our framework achieves a 42% reduction in mean absolute error relative to standard masked autoencoders and outperforms PhysFormer by 31%, reaching a final MAE of 3.2 bpm. Without explicit region-of-interest extraction, the model consistently attends to pulse-rich areas and demonstrates robustness under clinical occlusions and noise.

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

TL;DR

This work tackles non-contact vital-sign monitoring in Pediatric Intensive Care Units by developing a self-supervised rPPG framework tailored to clinical challenges. It integrates a VisionMamba-based student, a fixed physiological teacher (PhysMamba), and a novel Adaptive Masking Network that learns to occlude informative patches via policy gradient, guided by a curriculum that progresses from clean public data to real PICU videos. The approach combines masked reconstruction with physiological distillation, achieving a final MAE of bpm and a Pearson correlation of , while demonstrating strong robustness to occlusions and domain shifts. The method promises practical impact by enabling continuous, low-risk, contactless heart-rate monitoring in the PICU with real-time efficiency and without explicit ROI annotation.

Abstract

Continuous monitoring of vital signs in Pediatric Intensive Care Units (PICUs) is essential for early detection of clinical deterioration and effective clinical decision-making. However, contact-based sensors such as pulse oximeters may cause skin irritation, increase infection risk, and lead to patient discomfort. Remote photoplethysmography (rPPG) offers a contactless alternative to monitor heart rate using facial video, but remains underutilized in PICUs due to motion artifacts, occlusions, variable lighting, and domain shifts between laboratory and clinical data. We introduce a self-supervised pretraining framework for rPPG estimation in the PICU setting, based on a progressive curriculum strategy. The approach leverages the VisionMamba architecture and integrates an adaptive masking mechanism, where a lightweight Mamba-based controller assigns spatiotemporal importance scores to guide probabilistic patch sampling. This strategy dynamically increases reconstruction difficulty while preserving physiological relevance. To address the lack of labeled clinical data, we adopt a teacher-student distillation setup. A supervised expert model, trained on public datasets, provides latent physiological guidance to the student. The curriculum progresses through three stages: clean public videos, synthetic occlusion scenarios, and unlabeled videos from 500 pediatric patients. Our framework achieves a 42% reduction in mean absolute error relative to standard masked autoencoders and outperforms PhysFormer by 31%, reaching a final MAE of 3.2 bpm. Without explicit region-of-interest extraction, the model consistently attends to pulse-rich areas and demonstrates robustness under clinical occlusions and noise.
Paper Structure (43 sections, 12 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 43 sections, 12 equations, 9 figures, 9 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of our curriculum-based framework for robust rPPG estimation. The student model learns by reconstructing masked patches and predicting the rPPG signal from visible tokens. Training is guided by an expert PhysMamba teacher model and a learnable Adaptive Masking Network (AMN) optimized via policy gradient reinforcement.
  • Figure 2: Detailed architecture of the Adaptive Masking Network (AMN). The AMN computes token importance scores using Mamba blocks and selects a visible subset via Gumbel--Top-K sampling. Policy gradient optimization uses the rPPG distillation loss as reward to update the AMN. The student reconstructs the masked tokens and predicts the rPPG waveform.
  • Figure 3: Qualitative results for select patients. Plots show reconstructed rPPG vs ground truth in time and frequency domains.
  • Figure 4: Resilience to sensor artifacts. The model maintains stable rPPG waveforms even during periods of ground-truth PPG signal degradation (flat-lines).
  • Figure 5: Average occlusion area by type. Medical devices and bedding represent the most significant sources of signal obstruction.
  • ...and 4 more figures