ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces

Yusuke Akamatsu; Terumi Umematsu; Hitoshi Imaoka; Shizuko Gomi; Hideo Tsurushima

ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces

Yusuke Akamatsu, Terumi Umematsu, Hitoshi Imaoka, Shizuko Gomi, Hideo Tsurushima

TL;DR

ComFace tackles the challenge of capturing intra-personal facial changes by learning two complementary representations—inter-personal differences and intra-personal changes—from synthetic face images. It combines a contrastive inter-personal objective with a distance-based intra-personal objective, augmented by curriculum learning, and transfers the learned representations to three downstream two-image change tasks: facial expression change, weight change, and age change. Across extensive experiments, ComFace trained purely on synthetic data achieves transfer performance comparable to or better than methods trained on real images, showing strong generalization to new patients and conditions and outperforming patient-specific baselines in some weight-change settings. This work demonstrates the viability and practical impact of synthetic-data FRL for modeling subtle temporal facial changes in medical and emotion-recognition contexts.

Abstract

Daily monitoring of intra-personal facial changes associated with health and emotional conditions has great potential to be useful for medical, healthcare, and emotion recognition fields. However, the approach for capturing intra-personal facial changes is relatively unexplored due to the difficulty of collecting temporally changing face images. In this paper, we propose a facial representation learning method using synthetic images for comparing faces, called ComFace, which is designed to capture intra-personal facial changes. For effective representation learning, ComFace aims to acquire two feature representations, i.e., inter-personal facial differences and intra-personal facial changes. The key point of our method is the use of synthetic face images to overcome the limitations of collecting real intra-personal face images. Facial representations learned by ComFace are transferred to three extensive downstream tasks for comparing faces: estimating facial expression changes, weight changes, and age changes from two face images of the same individual. Our ComFace, trained using only synthetic data, achieves comparable to or better transfer performance than general pre-training and state-of-the-art representation learning methods trained using real images.

ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces

TL;DR

Abstract

Paper Structure (15 sections, 2 equations, 4 figures, 5 tables)

This paper contains 15 sections, 2 equations, 4 figures, 5 tables.

Introduction
Related Work
Method
Synthetic Face Images
FRL with Synthetic Face Images
Curriculum Learning of Intra-personal Facial Changes
Transfer Learning toward Downstream Tasks
Experiments
Setup for FRL
Setup for Downstream Tasks
Comparative Methods and Evaluation Metrics
Main Results
Ablation Study
Visualization
Conclusion and Societal Impact

Figures (4)

Figure 1: Overview of ComFace framework. ComFace performs facial representation learning using synthetic data relating to inter-personal facial differences and intra-personal facial changes. Then, facial representations are transferred to downstream tasks for comparing faces in order to capture intra-personal facial changes.
Figure 2: Training scheme of ComFace. Learning strategy consists of two components, i.e., inter-personal learning and intra-personal learning. Inter-personal learning acquires feature representations of facial differences between individuals. Intra-personal learning acquires feature representations of facial changes within individuals.
Figure 3: Transfer performance for facial expression change (AU12), weight change (Edema-A), and age change on different training scales. Facial expression change and age change are evaluated by correlation coefficient and weight change is evaluated by accuracy in fine-tuning. ComFace and the best models for general pre-training, visual representation learning, and FRL are compared.
Figure 4: Saliency maps for a face image in four pre-trained backbones. Original images show positions of AU6 and AU12 and eyelid and nose shape where edema appears.

ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces

TL;DR

Abstract

ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces

Authors

TL;DR

Abstract

Table of Contents

Figures (4)