Table of Contents
Fetching ...

PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba

Chaoqi Luo, Yiping Xie, Zitong Yu

TL;DR

Remote photoplethysmography (rPPG) enables contactless heart activity monitoring from facial videos, but existing CNN and Transformer approaches struggle to capture long-range spatio-temporal dependencies efficiently. The authors introduce PhysMamba, a Mamba-based framework that uses a Temporal Difference Mamba (TD-Mamba) block to refine local temporal changes and a dual-stream SlowFast architecture to fuse multi-scale temporal features, guided by a NegPearson loss for signal alignment. Across three benchmarks, PhysMamba achieves state-of-the-art accuracy with substantially fewer parameters and lower compute than baselines, demonstrating strong generalization in cross-dataset settings. This approach offers a practical path toward efficient, mobile-friendly rPPG systems capable of robust long-range modeling.

Abstract

Facial-video based Remote photoplethysmography (rPPG) aims at measuring physiological signals and monitoring heart activity without any contact, showing significant potential in various applications. Previous deep learning based rPPG measurement are primarily based on CNNs and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range spatio-temporal dependencies, while Transformers also struggle with modeling long video sequences with high complexity. Recently, the state space models (SSMs) represented by Mamba are known for their impressive performance on capturing long-range dependencies from long sequences. In this paper, we propose the PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Specifically, we introduce the Temporal Difference Mamba block to first enhance local dynamic differences and further model the long-range spatio-temporal context. Moreover, a dual-stream SlowFast architecture is utilized to fuse the multi-scale temporal features. Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba. The codes are available at https://github.com/Chaoqi31/PhysMamba

PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba

TL;DR

Remote photoplethysmography (rPPG) enables contactless heart activity monitoring from facial videos, but existing CNN and Transformer approaches struggle to capture long-range spatio-temporal dependencies efficiently. The authors introduce PhysMamba, a Mamba-based framework that uses a Temporal Difference Mamba (TD-Mamba) block to refine local temporal changes and a dual-stream SlowFast architecture to fuse multi-scale temporal features, guided by a NegPearson loss for signal alignment. Across three benchmarks, PhysMamba achieves state-of-the-art accuracy with substantially fewer parameters and lower compute than baselines, demonstrating strong generalization in cross-dataset settings. This approach offers a practical path toward efficient, mobile-friendly rPPG systems capable of robust long-range modeling.

Abstract

Facial-video based Remote photoplethysmography (rPPG) aims at measuring physiological signals and monitoring heart activity without any contact, showing significant potential in various applications. Previous deep learning based rPPG measurement are primarily based on CNNs and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range spatio-temporal dependencies, while Transformers also struggle with modeling long video sequences with high complexity. Recently, the state space models (SSMs) represented by Mamba are known for their impressive performance on capturing long-range dependencies from long sequences. In this paper, we propose the PhysMamba, a Mamba-based framework, to efficiently represent long-range physiological dependencies from facial videos. Specifically, we introduce the Temporal Difference Mamba block to first enhance local dynamic differences and further model the long-range spatio-temporal context. Moreover, a dual-stream SlowFast architecture is utilized to fuse the multi-scale temporal features. Extensive experiments are conducted on three benchmark datasets to demonstrate the superiority and efficiency of PhysMamba. The codes are available at https://github.com/Chaoqi31/PhysMamba
Paper Structure (17 sections, 7 equations, 2 figures, 6 tables, 1 algorithm)

This paper contains 17 sections, 7 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: Framework of the PhysMamba. It has a shallow stem and a temporal downsample operation ahead. Then for both slow and fast streams, it includes temporal difference Mamba blocks, lateral connections and a rPPG predictor head. Temporal Difference Mamba (TD-Mamba) consists of a Temporal Difference Convolution (TDC), a Temporal Bidirectional Mamba (Bi-Mamba) with forward and backward SSM, and a channel attention (CA) module.
  • Figure 2: Attention map, example curves of predicted rPPG signals with ground truth and Scatter Plot of cross-dataset results testing on (a) PURE and (b) UBFC-rPPG.