Table of Contents
Fetching ...

TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation

Taixi Chen, Yiu-ming Cheung

TL;DR

This work tackles remote heart rate estimation from RGB video (rPPG) under illumination and motion challenges by proposing TYrPPG, a lightweight model based on a Mambaout-inspired gated video understanding block (GVB) that fuses 2D-CNN and 3D-CNN. It introduces a Comprehensive Supervised Loss (CSL) comprising $Loss_c$, $Loss_p$, and $Loss_w$, with a weakly supervised variant using video-MMD to align predicted and ground-truth distributions. The approach delivers strong intra- and cross-dataset performance on PURE and MMPD, outperforming transformer-based and other supervised methods while maintaining computational efficiency, and is accompanied by an open-source implementation for reproducibility.

Abstract

Remote photoplethysmography (rPPG) can remotely extract physiological signals from RGB video, which has many advantages in detecting heart rate, such as low cost and no invasion to patients. The existing rPPG model is usually based on the transformer module, which has low computation efficiency. Recently, the Mamba model has garnered increasing attention due to its efficient performance in natural language processing tasks, demonstrating potential as a substitute for transformer-based algorithms. However, the Mambaout model and its variants prove that the SSM module, which is the core component of the Mamba model, is unnecessary for the vision task. Therefore, we hope to prove the feasibility of using the Mambaout-based module to remotely learn the heart rate. Specifically, we propose a novel rPPG algorithm called uncomplicated and enhanced learning capability rPPG (TYrPPG). This paper introduces an innovative gated video understanding block (GVB) designed for efficient analysis of RGB videos. Based on the Mambaout structure, this block integrates 2D-CNN and 3D-CNN to enhance video understanding for analysis. In addition, we propose a comprehensive supervised loss function (CSL) to improve the model's learning capability, along with its weakly supervised variants. The experiments show that our TYrPPG can achieve state-of-the-art performance in commonly used datasets, indicating its prospects and superiority in remote heart rate estimation. The source code is available at https://github.com/Taixi-CHEN/TYrPPG.

TYrPPG: Uncomplicated and Enhanced Learning Capability rPPG for Remote Heart Rate Estimation

TL;DR

This work tackles remote heart rate estimation from RGB video (rPPG) under illumination and motion challenges by proposing TYrPPG, a lightweight model based on a Mambaout-inspired gated video understanding block (GVB) that fuses 2D-CNN and 3D-CNN. It introduces a Comprehensive Supervised Loss (CSL) comprising , , and , with a weakly supervised variant using video-MMD to align predicted and ground-truth distributions. The approach delivers strong intra- and cross-dataset performance on PURE and MMPD, outperforming transformer-based and other supervised methods while maintaining computational efficiency, and is accompanied by an open-source implementation for reproducibility.

Abstract

Remote photoplethysmography (rPPG) can remotely extract physiological signals from RGB video, which has many advantages in detecting heart rate, such as low cost and no invasion to patients. The existing rPPG model is usually based on the transformer module, which has low computation efficiency. Recently, the Mamba model has garnered increasing attention due to its efficient performance in natural language processing tasks, demonstrating potential as a substitute for transformer-based algorithms. However, the Mambaout model and its variants prove that the SSM module, which is the core component of the Mamba model, is unnecessary for the vision task. Therefore, we hope to prove the feasibility of using the Mambaout-based module to remotely learn the heart rate. Specifically, we propose a novel rPPG algorithm called uncomplicated and enhanced learning capability rPPG (TYrPPG). This paper introduces an innovative gated video understanding block (GVB) designed for efficient analysis of RGB videos. Based on the Mambaout structure, this block integrates 2D-CNN and 3D-CNN to enhance video understanding for analysis. In addition, we propose a comprehensive supervised loss function (CSL) to improve the model's learning capability, along with its weakly supervised variants. The experiments show that our TYrPPG can achieve state-of-the-art performance in commonly used datasets, indicating its prospects and superiority in remote heart rate estimation. The source code is available at https://github.com/Taixi-CHEN/TYrPPG.

Paper Structure

This paper contains 13 sections, 17 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Visualization of the heart rate signals estimated by TYrPPG based on KL divergence and proposed video MMD. Straight and dotted lines mark the signals as ground truth and model estimation, respectively. TYrPPG can learn the ground truth better optimized by video MMD, as its estimation signal peaks are more consistent with the ground truth. This shows the effectiveness of the proposed video MMD.
  • Figure 2: TYrPPG is a gated 3D-CNN-based model structure. (a) shows the frame stem that is a data augmentation block to help TYrPPG understand the video better. (b) is the GVB, containing a TSM module and a gated 3D-CNN, which is designed to efficiently analyze video. Lastly, the distribution dissimilarity learning process is based on our proposed video MMD. Thus, the whole model structure is simpler than the Mamba-based and transformer-based models.
  • Figure 3: TYrPPG uses the Comprehensive Supervised Loss (CSL) to optimize the model. CSL contains three loss terms for different purposes, including 1) learning distribution dissimilarity, 2) learning heart rate details, and 3) learning trend similarity. Besides, its variant, Weak Supervised Loss (WSL), comprises only the $\textbf{Loss}_{\textbf{w}}\textbf{(.)}$ and $\textbf{Loss}_{\textbf{p}}\textbf{(.)}$, exploring the feasibility of obtaining a good generalization ability without learning the signal details.
  • Figure 4: Visualization of the heart rate signals estimated by Physformer, RythmFormer, and TYrPPG on the PURE dataset. TYrPPG learns significantly better than the other two models. Straight and dotted lines mark the signals as ground truth and model estimation, respectively.

Theorems & Definitions (2)

  • Remark 1
  • Remark 2