Table of Contents
Fetching ...

Memory-efficient Low-latency Remote Photoplethysmography through Temporal-Spatial State Space Duality

Kegang Wang, Jiankai Tang, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Yuntao Wang

TL;DR

Remote photoplethysmography (rPPG) faces memory and latency bottlenecks when leveraging deep learning. The authors present ME-rPPG, a memory-efficient framework based on temporal-spatial state space duality (TSD) that enables training on long video sequences while delivering real-time, single-frame inference with minimal memory. Key contributions include a Temporal Normalization Module and a State Space Duality backbone that together achieve strong cross-dataset generalization and real-world performance, with reported memory usage of 3.6 MB and latencies around 9.46 ms, and a public code release. This work advances practical, edge-friendly rPPG for continuous non-contact cardiovascular monitoring.

Abstract

Remote photoplethysmography (rPPG), enabling non-contact physiological monitoring through facial light reflection analysis, faces critical computational bottlenecks as deep learning introduces performance gains at the cost of prohibitive resource demands. This paper proposes ME-rPPG, a memory-efficient algorithm built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time constraints. Leveraging a transferable state space, ME-rPPG efficiently captures subtle periodic variations across facial frames while maintaining minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. Achieving cross-dataset MAEs of 5.38 (MMPD), 0.70 (VitalVideo), and 0.25 (PURE), ME-rPPG outperforms all baselines with improvements ranging from 21.3% to 60.2%. Our solution enables real-time inference with only 3.6 MB memory usage and 9.46 ms latency -- surpassing existing methods by 19.5%-49.7% accuracy and 43.2% user satisfaction gains in real-world deployments. The code and demos are released for reproducibility on https://health-hci-group.github.io/ME-rPPG-demo/.

Memory-efficient Low-latency Remote Photoplethysmography through Temporal-Spatial State Space Duality

TL;DR

Remote photoplethysmography (rPPG) faces memory and latency bottlenecks when leveraging deep learning. The authors present ME-rPPG, a memory-efficient framework based on temporal-spatial state space duality (TSD) that enables training on long video sequences while delivering real-time, single-frame inference with minimal memory. Key contributions include a Temporal Normalization Module and a State Space Duality backbone that together achieve strong cross-dataset generalization and real-world performance, with reported memory usage of 3.6 MB and latencies around 9.46 ms, and a public code release. This work advances practical, edge-friendly rPPG for continuous non-contact cardiovascular monitoring.

Abstract

Remote photoplethysmography (rPPG), enabling non-contact physiological monitoring through facial light reflection analysis, faces critical computational bottlenecks as deep learning introduces performance gains at the cost of prohibitive resource demands. This paper proposes ME-rPPG, a memory-efficient algorithm built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time constraints. Leveraging a transferable state space, ME-rPPG efficiently captures subtle periodic variations across facial frames while maintaining minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. Achieving cross-dataset MAEs of 5.38 (MMPD), 0.70 (VitalVideo), and 0.25 (PURE), ME-rPPG outperforms all baselines with improvements ranging from 21.3% to 60.2%. Our solution enables real-time inference with only 3.6 MB memory usage and 9.46 ms latency -- surpassing existing methods by 19.5%-49.7% accuracy and 43.2% user satisfaction gains in real-world deployments. The code and demos are released for reproducibility on https://health-hci-group.github.io/ME-rPPG-demo/.

Paper Structure

This paper contains 17 sections, 5 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The principle of state space duality. The state space is transferable and aligns with each input frame. SSD models trained on frame chunks can generalize to single-frame inference. The model takes a single frame as input and outputs the corresponding state along with the rPPG prediction.
  • Figure 2: Comparison of Predictions and Ground Truth. The predictions are obtained using ME-Flow, ME-Chunk, and TSCAN from a video in the VitalVideo dataset.
  • Figure 3: General framework of ME-rPPG. Our method takes resized facial frames as input and predicts a BVP value and a state. The TN module captures temporal variance while the TSD module extracts temporal-spatial features.
  • Figure 4: User Experience Evaluation. Users scored the overall experience, fluency, accuracy and stability subjectively.