Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

Mohamed Khalil Ben Salah; Philippe Jouvet; Rita Noumeir

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

Mohamed Khalil Ben Salah, Philippe Jouvet, Rita Noumeir

TL;DR

This work tackles non-contact vital-sign monitoring in Pediatric Intensive Care Units by developing a self-supervised rPPG framework tailored to clinical challenges. It integrates a VisionMamba-based student, a fixed physiological teacher (PhysMamba), and a novel Adaptive Masking Network that learns to occlude informative patches via policy gradient, guided by a curriculum that progresses from clean public data to real PICU videos. The approach combines masked reconstruction with physiological distillation, achieving a final MAE of $3.2$ bpm and a Pearson correlation of $R=0.91$, while demonstrating strong robustness to occlusions and domain shifts. The method promises practical impact by enabling continuous, low-risk, contactless heart-rate monitoring in the PICU with real-time efficiency and without explicit ROI annotation.

Abstract

Continuous monitoring of vital signs in Pediatric Intensive Care Units (PICUs) is essential for early detection of clinical deterioration and effective clinical decision-making. However, contact-based sensors such as pulse oximeters may cause skin irritation, increase infection risk, and lead to patient discomfort. Remote photoplethysmography (rPPG) offers a contactless alternative to monitor heart rate using facial video, but remains underutilized in PICUs due to motion artifacts, occlusions, variable lighting, and domain shifts between laboratory and clinical data. We introduce a self-supervised pretraining framework for rPPG estimation in the PICU setting, based on a progressive curriculum strategy. The approach leverages the VisionMamba architecture and integrates an adaptive masking mechanism, where a lightweight Mamba-based controller assigns spatiotemporal importance scores to guide probabilistic patch sampling. This strategy dynamically increases reconstruction difficulty while preserving physiological relevance. To address the lack of labeled clinical data, we adopt a teacher-student distillation setup. A supervised expert model, trained on public datasets, provides latent physiological guidance to the student. The curriculum progresses through three stages: clean public videos, synthetic occlusion scenarios, and unlabeled videos from 500 pediatric patients. Our framework achieves a 42% reduction in mean absolute error relative to standard masked autoencoders and outperforms PhysFormer by 31%, reaching a final MAE of 3.2 bpm. Without explicit region-of-interest extraction, the model consistently attends to pulse-rich areas and demonstrates robustness under clinical occlusions and noise.

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

TL;DR

bpm and a Pearson correlation of

, while demonstrating strong robustness to occlusions and domain shifts. The method promises practical impact by enabling continuous, low-risk, contactless heart-rate monitoring in the PICU with real-time efficiency and without explicit ROI annotation.

Abstract

Paper Structure (43 sections, 12 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 43 sections, 12 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
Related Works
Remote Photoplethysmography (rPPG)
Self-Supervised Learning for Video Understanding
SSL for Remote Photoplethysmography
State Space Models (SSMs)
Methodology
Overview
VisionMamba Student Model
Patch Tokenizer
Encoder Block
Decoder Head
Adaptive Masking Network
Physiological Knowledge Distillation
Training Objectives and Optimization
...and 28 more sections

Figures (9)

Figure 1: Overview of our curriculum-based framework for robust rPPG estimation. The student model learns by reconstructing masked patches and predicting the rPPG signal from visible tokens. Training is guided by an expert PhysMamba teacher model and a learnable Adaptive Masking Network (AMN) optimized via policy gradient reinforcement.
Figure 2: Detailed architecture of the Adaptive Masking Network (AMN). The AMN computes token importance scores using Mamba blocks and selects a visible subset via Gumbel--Top-K sampling. Policy gradient optimization uses the rPPG distillation loss as reward to update the AMN. The student reconstructs the masked tokens and predicts the rPPG waveform.
Figure 3: Qualitative results for select patients. Plots show reconstructed rPPG vs ground truth in time and frequency domains.
Figure 4: Resilience to sensor artifacts. The model maintains stable rPPG waveforms even during periods of ground-truth PPG signal degradation (flat-lines).
Figure 5: Average occlusion area by type. Medical devices and bedding represent the most significant sources of signal obstruction.
...and 4 more figures

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

TL;DR

Abstract

Non-Contact Physiological Monitoring in Pediatric Intensive Care Units via Adaptive Masking and Self-Supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)