Table of Contents
Fetching ...

PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

Jiyao Wang, Hao Lu, Ange Wang, Xiao Yang, Yingcong Chen, Dengbo He, Kaishun Wu

TL;DR

A large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization (MSSDG) protocol is proposed, which is based on multiple low-rank experts with a novel router mechanism, thereby enabling the model to adeptly handle both specifications and correlations within tasks.

Abstract

Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although partial features shared among different physiological signals can benefit multi-task learning, the sparse and imbalanced target label space brings the seesaw effect over task-specific feature learning. To resolve this problem, we designed an end-to-end Mixture of Low-rank Experts for multi-task remote Physiological measurement (PhysMLE), which is based on multiple low-rank experts with a novel router mechanism, thereby enabling the model to adeptly handle both specifications and correlations within tasks. Additionally, we introduced prior knowledge from physiology among tasks to overcome the imbalance of label space under real-world multi-task physiological measurement. For fair and comprehensive evaluations, this paper proposed a large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization (MSSDG) protocol. Extensive experiments with MSSDG and intra-dataset have shown the effectiveness and efficiency of PhysMLE. In addition, a new dataset was collected and made publicly available to meet the needs of the MSSDG.

PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

TL;DR

A large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization (MSSDG) protocol is proposed, which is based on multiple low-rank experts with a novel router mechanism, thereby enabling the model to adeptly handle both specifications and correlations within tasks.

Abstract

Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although partial features shared among different physiological signals can benefit multi-task learning, the sparse and imbalanced target label space brings the seesaw effect over task-specific feature learning. To resolve this problem, we designed an end-to-end Mixture of Low-rank Experts for multi-task remote Physiological measurement (PhysMLE), which is based on multiple low-rank experts with a novel router mechanism, thereby enabling the model to adeptly handle both specifications and correlations within tasks. Additionally, we introduced prior knowledge from physiology among tasks to overcome the imbalance of label space under real-world multi-task physiological measurement. For fair and comprehensive evaluations, this paper proposed a large-scale multi-task generalization benchmark, named Multi-Source Synsemantic Domain Generalization (MSSDG) protocol. Extensive experiments with MSSDG and intra-dataset have shown the effectiveness and efficiency of PhysMLE. In addition, a new dataset was collected and made publicly available to meet the needs of the MSSDG.
Paper Structure (32 sections, 9 equations, 12 figures, 8 tables, 1 algorithm)

This paper contains 32 sections, 9 equations, 12 figures, 8 tables, 1 algorithm.

Figures (12)

  • Figure 1: Illustration of the difference between Multi-source Synsemantic Domain Generalization (MSSDG) and classic multi-task learning or multi-source domain generalization.
  • Figure 2: The overall architecture of proposed PhysMLE. Our method achieves flexibility over two types of backbone network structures (subfigure (a)(b)). Briefly, PhysMLE takes STMaps that are compressed from facial videos from multiple domains as input. The core feature learning layers in each basic block of the backbone are replaced by the PhysMLE layer (subfigure (c)). To adapt different backbone structures, the proposed EFRouter can also be instantiated as two types (subfigure (d)).
  • Figure 3: Illustration of how STMap generation and data augmentation performed. After the 256-frame facial video (highlighted with the red box) is aligned and compressed to an STMap, we shifted the sliding window to the next 50 frames to get the augmented STMap (highlighted with the orange box). Meanwhile, for the augmented STMap, the row-wise shuffle was applied.
  • Figure 4: Example elaboration of $\mathcal{L}_{BR}$ calculation. For those samples with ground-truth BVP signals but without RR, we first detect R-R intervals and conduct frequency and PSD analysis to extract $y^{bvp}_{rr}$ from the high-frequency part. Then, we use $y^{bvp}_{rr}$ to supervise the learning of RR estimation according to Eq. (\ref{['eq7']}).
  • Figure 5: Sample video frames captured by different webcams in HCW.
  • ...and 7 more figures