Table of Contents
Fetching ...

FacePhys: State of the Heart Learning

Kegang Wang, Jiankai Tang, Yuntao Wang, Xin Liu, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Daniel McDuff

TL;DR

FacePhys tackles non-contact heart-rate estimation via rPPG with a focus on on-device, real-time inference under video compression constraints. It introduces a discretized neural CDE framework that leverages temporal-spatial state-space duality (TSD) and Temporal Normalization to enable long-sequence training and single-frame inference with low memory usage. By employing a complex diagonal state transition matrix, it induces oscillatory dynamics to capture physiological periodicity while maintaining linear-time inference. Across five datasets, FacePhys achieves state-of-the-art HR accuracy, strong cross-dataset generalization, and a memory footprint of 3.6 MB with 9.46 ms latency, enabling practical deployment on resource-constrained devices. The work presents a physiology-informed SSM approach that balances accuracy and efficiency for real-time, non-contact cardiovascular monitoring.

Abstract

Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradation of transmitting data through compressive channels that reduce signal quality. We propose a memory efficient rPPG algorithm - \emph{FacePhys} - built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time operation. Leveraging a transferable heart state, FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. FacePhys establishes a new state-of-the-art, with a substantial 49\% reduction in error. Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms -- surpassing existing methods by 83\% to 99\%. These results translate into reliable real-time performance in practical deployments, and a live demo is available at https://www.facephys.com/.

FacePhys: State of the Heart Learning

TL;DR

FacePhys tackles non-contact heart-rate estimation via rPPG with a focus on on-device, real-time inference under video compression constraints. It introduces a discretized neural CDE framework that leverages temporal-spatial state-space duality (TSD) and Temporal Normalization to enable long-sequence training and single-frame inference with low memory usage. By employing a complex diagonal state transition matrix, it induces oscillatory dynamics to capture physiological periodicity while maintaining linear-time inference. Across five datasets, FacePhys achieves state-of-the-art HR accuracy, strong cross-dataset generalization, and a memory footprint of 3.6 MB with 9.46 ms latency, enabling practical deployment on resource-constrained devices. The work presents a physiology-informed SSM approach that balances accuracy and efficiency for real-time, non-contact cardiovascular monitoring.

Abstract

Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradation of transmitting data through compressive channels that reduce signal quality. We propose a memory efficient rPPG algorithm - \emph{FacePhys} - built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time operation. Leveraging a transferable heart state, FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. FacePhys establishes a new state-of-the-art, with a substantial 49\% reduction in error. Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms -- surpassing existing methods by 83\% to 99\%. These results translate into reliable real-time performance in practical deployments, and a live demo is available at https://www.facephys.com/.

Paper Structure

This paper contains 17 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: The FacePhys State Space Model, provides an effective representation of the cyclical nature of heart beats, as shown by the latent embeddings (A), combining high accuracy (B-i) and high efficiency (B-ii). We achieve gains of 49% in heart rate estimation and 83% in per-frame latency compared to the current state-of-the-art.
  • Figure 2: The CDE form of the time-continuous heart state space is used to describe the ideal heart state changes over time. However, it suffers from extremely low computational efficiency, its discretized form can be expressed as a state space model, with high computational efficiency.
  • Figure 3: FacePhys Framework. Our framework utilizes the SSM dual as the core component, which serves both as an efficient discretization solver for the heart state CDE and as a linear attention processor. It employs Temporal Normalization (TN) to stabilize the extraction of temporal features and introduces a complex state transition matrix A to enable periodic attention.
  • Figure 4: Introducing trainable complex numbers into the diagonalized state transition matrix $A$ generates oscillatory terms in the solution, which is dual to periodic attention. (a) Original linear causal attention. (b) Periodic attention generated by FacePhys introducing complex numbers in $A$.
  • Figure 5: In terms of model accuracy and latency, through the heart state space model, FacePhys's efficiency far exceeds that of previous methods.