Table of Contents
Fetching ...

FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography

Wei Qian, Dan Guo, Jinxing Zhou, Bochao Zou, Zitong Yu, Meng Wang

Abstract

Remote photoplethysmography (rPPG) enables contactless physiological monitoring by capturing subtle skin-color variations from facial videos. However, most existing methods predominantly rely on time-domain modeling, making them vulnerable to motion artifacts and illumination fluctuations, where weak physiological clues are easily overwhelmed by noise. To address these challenges, we propose FreqPhys, a frequency-guided rPPG framework that explicitly leverages physiological frequency priors for robust signal recovery. Specifically, FreqPhys first applies a Physiological Bandpass Filtering module to suppress out-of-band interference, and then performs Physiological Spectrum Modulation together with adaptive spectral selection to emphasize pulse-related frequency components while suppress residual in-band noise. A Cross-domain Representation Learning module further fuses these spectral priors with deep time-domain features to capture informative spatial--temporal dependencies. Finally, a frequency-aware conditional diffusion process progressively reconstructs high-fidelity rPPG signals. Extensive experiments on six benchmarks demonstrate that FreqPhys yields significant improvements over state-of-the-art approaches, particularly under challenging motion conditions. It highlights the importance of explicitly modeling physiological frequency priors. The source code will be released.

FreqPhys: Repurposing Implicit Physiological Frequency Prior for Robust Remote Photoplethysmography

Abstract

Remote photoplethysmography (rPPG) enables contactless physiological monitoring by capturing subtle skin-color variations from facial videos. However, most existing methods predominantly rely on time-domain modeling, making them vulnerable to motion artifacts and illumination fluctuations, where weak physiological clues are easily overwhelmed by noise. To address these challenges, we propose FreqPhys, a frequency-guided rPPG framework that explicitly leverages physiological frequency priors for robust signal recovery. Specifically, FreqPhys first applies a Physiological Bandpass Filtering module to suppress out-of-band interference, and then performs Physiological Spectrum Modulation together with adaptive spectral selection to emphasize pulse-related frequency components while suppress residual in-band noise. A Cross-domain Representation Learning module further fuses these spectral priors with deep time-domain features to capture informative spatial--temporal dependencies. Finally, a frequency-aware conditional diffusion process progressively reconstructs high-fidelity rPPG signals. Extensive experiments on six benchmarks demonstrate that FreqPhys yields significant improvements over state-of-the-art approaches, particularly under challenging motion conditions. It highlights the importance of explicitly modeling physiological frequency priors. The source code will be released.

Paper Structure

This paper contains 41 sections, 3 theorems, 62 equations, 7 figures, 14 tables.

Key Result

theorem 1

(Frequency-domain Convolution Theorem) The multiplication of two signals in the frequency domain is equivalent to the frequency transformation of a circular convolution of these two signals in the temporal domain, which can be summarized as: where $\otimes$ and $\odot$ represent circular convolutional operation and element multiplication operation, respectively, $\mathbf{M}(v)$ and $\mathbf{Z}(v)

Figures (7)

  • Figure 1: Visualization of the differences between ground-truth and raw rPPG signals in time and frequency domains. (a) Ground-truth rPPG signal, whose spectrum exhibits clear physiological priors: (i) Physiological Band Constraint, where the spectral energy is concentrated within the physiological bandwidth of [0.66, 3.0] Hz corresponding to normal heart rate ranges, and (ii) Dominant Peak Property, where a prominent spectral peak within this band reflects the periodic cardiac rhythm, while other in-band components remain comparatively low. The dominant frequency peak (marked by ★) corresponds to heart rate and is converted to beats per minute by multiplying the frequency by 60. (b)(c) Raw rPPG signals extracted from facial videos under stable and motion conditions, computed by averaging green-channel pixel intensities over time wang2016algorithmic. In the time domain, physiological signals are heavily entangled with noise. In the frequency domain, noise manifests as both out-of-band interference and residual in-band components, predominantly concentrated at lower frequencies. While the stable scenario preserves a relatively distinct spectral peak, motion artifacts disperse the in-band energy distribution, making reliable denoising considerably more challenging.
  • Figure 2: The pipeline of proposed FreqPhys. Given a facial video, we first construct MSTmap $\mathbf{X}$ as the temporal condition and generate the frequency condition $\mathbf{C}^{\mathbf{P}}$ by applying the PBF. During training, we initially generate noise rPPG $\mathbf{Y}_{k}$ by adding Gaussian noise to Ground Truth rPPG $\mathbf{Y}_{0}$ for the $k$-th step. Then, we input $\mathbf{Y}_{k}$, $\mathbf{X}$, $k$, and $\mathbf{C}^{\mathbf{P}}$ into the Denoising Network. Specifically, the frequency condition $\mathbf{C}^{\mathbf{P}}$ is fed into the Physiological Frequency Denoiser module to enhance physiological spectral clues through three key steps: (i) PBF removes out-of-band noise based on the physiological frequency bandwidth [0.66,3.0] Hz; (ii) PSM emphasizes valid physiological harmonics by modeling interactions between real and imaginary components; (iii) ASS dynamically suppresses in-band noise using data-driven energy thresholds. Next, with Cross-domain Representation Learning, our FreqPhys includes frequency-domain denoised information into space and time dependencies modeling to estimate the high-fidelity rPPG signal. During inference, the initial rPPG $\mathbf{Y}_{K}$ is randomly sampled from Gaussian noise, with frequency condition and denoising network processes mirroring those used in training.
  • Figure 3: The details of physiological spectrum modulation module.
  • Figure 4: Architecture comparison with existing diffusion methods.
  • Figure 5: Time and frequency domain visualizations of rPPG signal predictions on the VIPL dataset under head motion scenario. In the frequency-domain plots, the purple dashed box indicates the physiological signal bandwidth ranging from 0.66 to 3.0 Hz, corresponding to typical human cardiac frequencies. $\textcolor{red}{\star}$ and $\textcolor{mygreen}{\bullet}$ denote the spectral peaks of the ground-truth and predicted heart rates, respectively.
  • ...and 2 more figures

Theorems & Definitions (3)

  • theorem 1
  • theorem 2
  • proposition 1