CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement
Zheng Wu, Yiping Xie, Bo Zhao, Jiguang He, Fei Luo, Ning Deng, Zitong Yu
TL;DR
CardiacMamba tackles non-contact heart-rate estimation by fusing RGB video and RF radar signals through a state-space–informed multimodal framework. The method introduces the Temporal Difference Mamba Module (TDMM), Bidirectional State Space Model (Bi-SSM), and Channel-wise Fast Fourier Transform (CFFT) to extract dynamic temporal features, align modalities bidirectionally, and refine frequency-domain cues for heart-rate periodicity. On EquiPleth, CardiacMamba achieves state-of-the-art accuracy and robustness, reduces skin-tone bias, and remains effective under missing-modality conditions, demonstrating strong potential for fair, real-world healthcare deployment. By integrating Mamba-based cross-modal modeling with frequency-domain fusion, the approach advances rPPG technology toward reliable, scalable remote monitoring.
Abstract
Heart rate (HR) estimation via remote photoplethysmography (rPPG) offers a non-invasive solution for health monitoring. However, traditional single-modality approaches (RGB or Radio Frequency (RF)) face challenges in balancing robustness and accuracy due to lighting variations, motion artifacts, and skin tone bias. In this paper, we propose CardiacMamba, a multimodal RGB-RF fusion framework that leverages the complementary strengths of both modalities. It introduces the Temporal Difference Mamba Module (TDMM) to capture dynamic changes in RF signals using timing differences between frames, enhancing the extraction of local and global features. Additionally, CardiacMamba employs a Bidirectional SSM for cross-modal alignment and a Channel-wise Fast Fourier Transform (CFFT) to effectively capture and refine the frequency domain characteristics of RGB and RF signals, ultimately improving heart rate estimation accuracy and periodicity detection. Extensive experiments on the EquiPleth dataset demonstrate state-of-the-art performance, achieving marked improvements in accuracy and robustness. CardiacMamba significantly mitigates skin tone bias, reducing performance disparities across demographic groups, and maintains resilience under missing-modality scenarios. By addressing critical challenges in fairness, adaptability, and precision, the framework advances rPPG technology toward reliable real-world deployment in healthcare. The codes are available at: https://github.com/WuZheng42/CardiacMamba.
