Table of Contents
Fetching ...

Recovering Pulse Waves from Video Using Deep Unrolling and Deep Equilibrium Models

Vineet R Shenoy, Suhas Lohit, Hassan Mansour, Rama Chellappa, Tim K. Marks

TL;DR

The paper tackles non-contact heart-rate estimation from facial video (iPPG) by formulating pulse waveform recovery as an inverse problem with learned priors. It introduces three approaches—Unrolled iPPG, DE-Prox-iPPG, and UDEQ-iPPG—that couple gradient-descent data fidelity with neural denoisers, including fixed-point DEQ components, to recover the pulsatile signal. Across MMSE-HR, PURE, and UBFC-rPPG, the methods achieve state-of-the-art HR estimates while using a fraction of the parameters of competing models, with UDEQ-iPPG delivering the best overall performance and generalization. This framework provides a principled, interpretable path to robust pulse waveform recovery from video, enabling accurate HR monitoring in challenging real-world scenarios with lower model complexity.

Abstract

Camera-based monitoring of vital signs, also known as imaging photoplethysmography (iPPG), has seen applications in driver-monitoring, perfusion assessment in surgical settings, affective computing, and more. iPPG involves sensing the underlying cardiac pulse from video of the skin and estimating vital signs such as the heart rate or a full pulse waveform. Some previous iPPG methods impose model-based sparse priors on the pulse signals and use iterative optimization for pulse wave recovery, while others use end-to-end black-box deep learning methods. In contrast, we introduce methods that combine signal processing and deep learning methods in an inverse problem framework. Our methods estimate the underlying pulse signal and heart rate from facial video by learning deep-network-based denoising operators that leverage deep algorithm unfolding and deep equilibrium models. Experiments show that our methods can denoise an acquired signal from the face and infer the correct underlying pulse rate, achieving state-of-the-art heart rate estimation performance on well-known benchmarks, all with less than one-fifth the number of learnable parameters as the closest competing method.

Recovering Pulse Waves from Video Using Deep Unrolling and Deep Equilibrium Models

TL;DR

The paper tackles non-contact heart-rate estimation from facial video (iPPG) by formulating pulse waveform recovery as an inverse problem with learned priors. It introduces three approaches—Unrolled iPPG, DE-Prox-iPPG, and UDEQ-iPPG—that couple gradient-descent data fidelity with neural denoisers, including fixed-point DEQ components, to recover the pulsatile signal. Across MMSE-HR, PURE, and UBFC-rPPG, the methods achieve state-of-the-art HR estimates while using a fraction of the parameters of competing models, with UDEQ-iPPG delivering the best overall performance and generalization. This framework provides a principled, interpretable path to robust pulse waveform recovery from video, enabling accurate HR monitoring in challenging real-world scenarios with lower model complexity.

Abstract

Camera-based monitoring of vital signs, also known as imaging photoplethysmography (iPPG), has seen applications in driver-monitoring, perfusion assessment in surgical settings, affective computing, and more. iPPG involves sensing the underlying cardiac pulse from video of the skin and estimating vital signs such as the heart rate or a full pulse waveform. Some previous iPPG methods impose model-based sparse priors on the pulse signals and use iterative optimization for pulse wave recovery, while others use end-to-end black-box deep learning methods. In contrast, we introduce methods that combine signal processing and deep learning methods in an inverse problem framework. Our methods estimate the underlying pulse signal and heart rate from facial video by learning deep-network-based denoising operators that leverage deep algorithm unfolding and deep equilibrium models. Experiments show that our methods can denoise an acquired signal from the face and infer the correct underlying pulse rate, achieving state-of-the-art heart rate estimation performance on well-known benchmarks, all with less than one-fifth the number of learnable parameters as the closest competing method.

Paper Structure

This paper contains 25 sections, 1 theorem, 15 equations, 9 figures, 6 tables.

Key Result

Theorem 1

Implicit Function Theorem (IFT) bai2019deepimplicit_functiondeq-flow. Given the fixed-point representation $\mathbf{p}^*$ of the optimzation variables, and the corresponding loss $\mathcal{L}(\mathbf{Z}, \mathbf{Z}_{\textup{gt}})$, where $\mathbf{Z} = \mathbf{F^{-1}X^*}$, the gradient of the DEQ flo

Figures (9)

  • Figure 1: Our full pipeline. We pass the input video frames through a face and landmark detector, extrapolate additional facial landmarks, and obtain a time series based on pixel intensities from five segmented face regions. We then pass our raw signals to either Unrolled iPPG, DE-Prox-iPPG, or UDEQ-iPPG which separates the signal from the noise. Finally, we sum the power spectrum coefficients across individual bins for all five signals, and select the bin with the highest power as our heart-rate estimate.
  • Figure 2: Our unrolling algorithm for both Unrolled iPPG and UDEQ-iPPG. We replace the proximal operators of proximal gradient descent by deep denoisers $\mathcal{R}_{\theta_\mathcal{R}}$ and $\mathcal{Q}_{\theta_\mathcal{Q}}$ on the frequency coefficients $\mathbf{X}$ and noise $\mathbf{E}$, respectively. Unrolled iPPG implements a single forward pass through $\mathcal{R}_{\theta_\mathcal{R}}$ and $\mathcal{Q}_{\theta_\mathcal{Q}}$, while UDEQ-iPPG effectively applies $\mathcal{R}$ to $\mathbf{\Tilde{X}}_t$ until the output converges to a fixed-point.
  • Figure 3: Our DE-Prox-iPPG algorithmic paradigm. The video-extracted time series $\mathbf{Z}$ and the two concatenated optimization variables $[\mathbf{X}^*,\mathbf{E}^*]^\mathsf{T}$ are passed through the block outlined in red which performs gradient updates $\frac{\partial \mathbf{D}}{\partial \mathbf{X}^*}, \frac{\partial \mathbf{D}}{\partial \mathbf{E}^*}$. The results $\mathbf{\Tilde{X}}^*$ and $\mathbf{\Tilde{E}}^*$ are then passed through denoising operators $\mathcal{R}$ and $\mathcal{Q}$. This process is repeated until fixed-point convergence.
  • Figure 4: The DEQ operator used in UDEQ-iPPG. At each unrolling iteration, we apply the denoising operator until a fixed-point is found.
  • Figure 5: Image stills from the MMSE-HR dataset (top), PURE dataset (middle), and UBFC-rPPG dataset (bottom).
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem 1