ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces
Yusuke Akamatsu, Terumi Umematsu, Hitoshi Imaoka, Shizuko Gomi, Hideo Tsurushima
TL;DR
ComFace tackles the challenge of capturing intra-personal facial changes by learning two complementary representations—inter-personal differences and intra-personal changes—from synthetic face images. It combines a contrastive inter-personal objective with a distance-based intra-personal objective, augmented by curriculum learning, and transfers the learned representations to three downstream two-image change tasks: facial expression change, weight change, and age change. Across extensive experiments, ComFace trained purely on synthetic data achieves transfer performance comparable to or better than methods trained on real images, showing strong generalization to new patients and conditions and outperforming patient-specific baselines in some weight-change settings. This work demonstrates the viability and practical impact of synthetic-data FRL for modeling subtle temporal facial changes in medical and emotion-recognition contexts.
Abstract
Daily monitoring of intra-personal facial changes associated with health and emotional conditions has great potential to be useful for medical, healthcare, and emotion recognition fields. However, the approach for capturing intra-personal facial changes is relatively unexplored due to the difficulty of collecting temporally changing face images. In this paper, we propose a facial representation learning method using synthetic images for comparing faces, called ComFace, which is designed to capture intra-personal facial changes. For effective representation learning, ComFace aims to acquire two feature representations, i.e., inter-personal facial differences and intra-personal facial changes. The key point of our method is the use of synthetic face images to overcome the limitations of collecting real intra-personal face images. Facial representations learned by ComFace are transferred to three extensive downstream tasks for comparing faces: estimating facial expression changes, weight changes, and age changes from two face images of the same individual. Our ComFace, trained using only synthetic data, achieves comparable to or better transfer performance than general pre-training and state-of-the-art representation learning methods trained using real images.
