Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement

Xinyu Zhang; Weiyu Sun; Hao Lu; Ying Chen; Yun Ge; Xiaolin Huang; Jie Yuan; Yingcong Chen

Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement

Xinyu Zhang, Weiyu Sun, Hao Lu, Ying Chen, Yun Ge, Xiaolin Huang, Jie Yuan, Yingcong Chen

TL;DR

A Self-Similarity Prior Distillation (SSPD) framework for unsupervised rPPG estimation, which capitalizes on the intrinsic temporal self-similarity of cardiac activities and has the lowest inference time and computation cost among end-to-end models.

Abstract

Remote photoplethysmography (rPPG) is a noninvasive technique that aims to capture subtle variations in facial pixels caused by changes in blood volume resulting from cardiac activities. Most existing unsupervised methods for rPPG tasks focus on the contrastive learning between samples while neglecting the inherent self-similar prior in physiological signals. In this paper, we propose a Self-Similarity Prior Distillation (SSPD) framework for unsupervised rPPG estimation, which capitalizes on the intrinsic self-similarity of cardiac activities. Specifically, we first introduce a physical-prior embedded augmentation technique to mitigate the effect of various types of noise. Then, we tailor a self-similarity-aware network to extract more reliable self-similar physiological features. Finally, we develop a hierarchical self-distillation paradigm to assist the network in disentangling self-similar physiological patterns from facial videos. Comprehensive experiments demonstrate that the unsupervised SSPD framework achieves comparable or even superior performance compared to the state-of-the-art supervised methods. Meanwhile, SSPD maintains the lowest inference time and computation cost among end-to-end models.

Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement

TL;DR

Abstract

Paper Structure (39 sections, 15 equations, 16 figures, 9 tables, 1 algorithm)

This paper contains 39 sections, 15 equations, 16 figures, 9 tables, 1 algorithm.

Introduction
Related Works
Self-supervised Learning
Remote Physiological Measurement
Method
Overview
Self-similarity Prior in rPPG
Self-Similarity Map (SSM)
Self-Similarity Wave (SSW)
Self-similarity Prior with Unsupervised Learning
Physical-prior Embedded Augmentation
Video Preprocessing
Local-global Augmentation
Masked Difference Modeling
Self-similarity-aware Network
...and 24 more sections

Figures (16)

Figure 1: Self-similarity map characterizes the self-similarity of each token pair in rPPG signals.
Figure 2: Interrelationships between rPPG signal, self-similarity map, and self-similarity wave. Based on prior knowledge, the self-similarity wave extracts the strong periodic component from the self-similarity map.
Figure 3: The architecture of our SSPD framework for unsupervised remote physiological measurement. This framework first incorporates Local-Global Augmentation (LGA) and Masked Difference Modeling (MDM) to generate two distorted views. Then, the tailored self-similarity-aware network consists of a backbone, a predictor module, and a Separable Self-Similarity Model ($S^3M$). Each Temporal Similarity (TS) block in $S^3M$ exploits self-similar representations at a specific time scale, forming the temporal similarity pyramid $\{\mathcal{M}^{<1>},\mathcal{M}^{<2>},...,\mathcal{M}^{<L>}\}$, and the $S^3M$ is used exclusively for training. Finally, the proposed hierarchical self-distillation paradigm comprises Temporal Similarity Pyramid Distillation (TSPD) and RPPG Prediction Distillation (RPD), enabling self-similarity-aware learning and rPPG signal decoupling.
Figure 4: The architecture of the self-similarity-aware network. Each "ConvBlock" comprises two convolution layers, with "GAP" representing global average pooling. "Downsampling" refers to a reduction of half in the temporal domain. The dimensions are presented as N$\times$C$\times$T$\times$H$\times$W, where N and C indicate the number of samples and channels, respectively.
Figure 5: The architecture of the Temporal Similarity (TS) block. Firstly, the input sequences are projected into token embeddings at a specific time scale. Next, the multi-head dot-product attention improves the global context information, followed by a linear projection. We derive the self-similarity map which forms a layer in the temporal similarity pyramid from the token embeddings. We add a residual connection from the input sequence to the output tokens and perform downsampling for dimension alignment.
...and 11 more figures

Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement

TL;DR

Abstract

Self-similarity Prior Distillation for Unsupervised Remote Physiological Measurement

Authors

TL;DR

Abstract

Table of Contents

Figures (16)