FacePhys: State of the Heart Learning
Kegang Wang, Jiankai Tang, Yuntao Wang, Xin Liu, Yuxuan Fan, Jiatong Ji, Yuanchun Shi, Daniel McDuff
TL;DR
FacePhys tackles non-contact heart-rate estimation via rPPG with a focus on on-device, real-time inference under video compression constraints. It introduces a discretized neural CDE framework that leverages temporal-spatial state-space duality (TSD) and Temporal Normalization to enable long-sequence training and single-frame inference with low memory usage. By employing a complex diagonal state transition matrix, it induces oscillatory dynamics to capture physiological periodicity while maintaining linear-time inference. Across five datasets, FacePhys achieves state-of-the-art HR accuracy, strong cross-dataset generalization, and a memory footprint of 3.6 MB with 9.46 ms latency, enabling practical deployment on resource-constrained devices. The work presents a physiology-informed SSM approach that balances accuracy and efficiency for real-time, non-contact cardiovascular monitoring.
Abstract
Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac measurement through minute changes in light reflected from the skin. However, practical deployment is limited by the computational constraints of performing analysis on front-end devices and the accuracy degradation of transmitting data through compressive channels that reduce signal quality. We propose a memory efficient rPPG algorithm - \emph{FacePhys} - built on temporal-spatial state space duality, which resolves the trilemma of model scalability, cross-dataset generalization, and real-time operation. Leveraging a transferable heart state, FacePhys captures subtle periodic variations across video frames while maintaining a minimal computational overhead, enabling training on extended video sequences and supporting low-latency inference. FacePhys establishes a new state-of-the-art, with a substantial 49\% reduction in error. Our solution enables real-time inference with a memory footprint of 3.6 MB and per-frame latency of 9.46 ms -- surpassing existing methods by 83\% to 99\%. These results translate into reliable real-time performance in practical deployments, and a live demo is available at https://www.facephys.com/.
