Table of Contents
Fetching ...

Generalizable Face Landmarking Guided by Conditional Face Warping

Jiayi Liang, Haotian Liu, Hongteng Xu, Dixin Luo

TL;DR

This work tackles generalizable face landmarking by embedding a landmark predictor within a conditional face warper that deforms real faces to stylized targets via a parametric warping field $w_{i,\gamma}$. The warper uses a polyharmonic interpolation model to generate pseudo landmarks for stylized images and trains with an alternating optimization scheme that couples warper and landmarker updates while employing a proximal regularizer. Empirical results on real, caricature, and artistic faces show superior cross-domain generalization, particularly in generalized zero-shot learning, compared to standard domain-adaptation baselines. The approach is effective across backbone models and highlights the value of using warping-informed supervision to bridge large style and geometry gaps in landmarking tasks, with practical implications for animation, gaming, and AI-assisted art. $\,$

Abstract

As a significant step for human face modeling, editing, and generation, face landmarking aims at extracting facial keypoints from images. A generalizable face landmarker is required in practice because real-world facial images, e.g., the avatars in animations and games, are often stylized in various ways. However, achieving generalizable face landmarking is challenging due to the diversity of facial styles and the scarcity of labeled stylized faces. In this study, we propose a simple but effective paradigm to learn a generalizable face landmarker based on labeled real human faces and unlabeled stylized faces. Our method learns the face landmarker as the key module of a conditional face warper. Given a pair of real and stylized facial images, the conditional face warper predicts a warping field from the real face to the stylized one, in which the face landmarker predicts the ending points of the warping field and provides us with high-quality pseudo landmarks for the corresponding stylized facial images. Applying an alternating optimization strategy, we learn the face landmarker to minimize $i)$ the discrepancy between the stylized faces and the warped real ones and $ii)$ the prediction errors of both real and pseudo landmarks. Experiments on various datasets show that our method outperforms existing state-of-the-art domain adaptation methods in face landmarking tasks, leading to a face landmarker with better generalizability. Code is available at https://plustwo0.github.io/project-face-landmarker.

Generalizable Face Landmarking Guided by Conditional Face Warping

TL;DR

This work tackles generalizable face landmarking by embedding a landmark predictor within a conditional face warper that deforms real faces to stylized targets via a parametric warping field . The warper uses a polyharmonic interpolation model to generate pseudo landmarks for stylized images and trains with an alternating optimization scheme that couples warper and landmarker updates while employing a proximal regularizer. Empirical results on real, caricature, and artistic faces show superior cross-domain generalization, particularly in generalized zero-shot learning, compared to standard domain-adaptation baselines. The approach is effective across backbone models and highlights the value of using warping-informed supervision to bridge large style and geometry gaps in landmarking tasks, with practical implications for animation, gaming, and AI-assisted art.

Abstract

As a significant step for human face modeling, editing, and generation, face landmarking aims at extracting facial keypoints from images. A generalizable face landmarker is required in practice because real-world facial images, e.g., the avatars in animations and games, are often stylized in various ways. However, achieving generalizable face landmarking is challenging due to the diversity of facial styles and the scarcity of labeled stylized faces. In this study, we propose a simple but effective paradigm to learn a generalizable face landmarker based on labeled real human faces and unlabeled stylized faces. Our method learns the face landmarker as the key module of a conditional face warper. Given a pair of real and stylized facial images, the conditional face warper predicts a warping field from the real face to the stylized one, in which the face landmarker predicts the ending points of the warping field and provides us with high-quality pseudo landmarks for the corresponding stylized facial images. Applying an alternating optimization strategy, we learn the face landmarker to minimize the discrepancy between the stylized faces and the warped real ones and the prediction errors of both real and pseudo landmarks. Experiments on various datasets show that our method outperforms existing state-of-the-art domain adaptation methods in face landmarking tasks, leading to a face landmarker with better generalizability. Code is available at https://plustwo0.github.io/project-face-landmarker.
Paper Structure (23 sections, 3 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 3 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Both commercial software like Face++ and open-source method like SLPT xia2022sparse work well on landmarking real faces (e.g., those in 300W Sagonas_Tzimiropoulos_Zafeiriou_Pantic_2014) while achieving suboptimal performance when landmarking stylized faces (e.g., those in CariFace Cai_Guo_Peng_Zhang_2020 and ArtiFace yaniv2019face). While existing domain adaptation method does not improve the performance significantly, our method achieves a generalizable face landmarker for various facial images.
  • Figure 2: The scheme of our proposed method for learning a generalizable face landmarker.
  • Figure 3: Illustrations of conditional face warping results. Taking a cartoon face as the target, our model warps real human faces accordingly. The red dots indicate real human face landmarks, and green dots indicate cartoon and warped face landmarks.
  • Figure 4: Illustrations of typical samples in the 300W, CariFace, and ArtiFace datasets, each of which is annotated with landmarks.
  • Figure 5: Visual comparisons for various methods in the two DA settings. We only highlight points on the inner lips in the enlarged region of the mouth in (a), as well as the eyes and the sides of the cheeks, excluding points on the eyebrows in (b).
  • ...and 6 more figures