Table of Contents
Fetching ...

Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry

Cho-Ying Wu, Qiangeng Xu, Ulrich Neumann

TL;DR

This work addresses the ill-posed problem of recovering complete 3D facial geometry from monocular images by introducing a bidirectional synergy between 3DMM parameters and 3D landmarks. The approach, SynergyNet, uses a two-stage pipeline with MAFA landmark refinement and a landmark-to-3DMM module to establish a representation cycle that alternates between predicting 3DMM parameters from images and regressing 3DMM parameters from refined landmarks. Key contributions include the multi-attribute feature aggregation for landmark refinement, the reverse representation direction, and a self-supervised consistency loss that enhances information flow, yielding state-of-the-art results for facial alignment, face orientation estimation, and 3D face modeling on AFLW and Florence benchmarks. The method relies on simple, fast network blocks to achieve high throughput (≈2600fps for landmarks and ≈2300fps for dense 3D faces) and demonstrates robustness across large pose variations, with supplementary texture synthesis to generate more realistic textures.

Abstract

This work studies learning from a synergy process of 3D Morphable Models (3DMM) and 3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face modeling. Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks. 3D landmarks can be extracted and refined from face meshes built by 3DMM parameters. We next reverse the representation direction and show that predicting 3DMM parameters from sparse 3D landmarks improves the information flow. Together we create a synergy process that utilizes the relation between 3D landmarks and 3DMM parameters, and they collaboratively contribute to better performance. We extensively validate our contribution on full tasks of facial geometry prediction and show our superior and robust performance on these tasks for various scenarios. Particularly, we adopt only simple and widely-used network operations to attain fast and accurate facial geometry prediction. Codes and data: https://choyingw.github.io/works/SynergyNet/

Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry

TL;DR

This work addresses the ill-posed problem of recovering complete 3D facial geometry from monocular images by introducing a bidirectional synergy between 3DMM parameters and 3D landmarks. The approach, SynergyNet, uses a two-stage pipeline with MAFA landmark refinement and a landmark-to-3DMM module to establish a representation cycle that alternates between predicting 3DMM parameters from images and regressing 3DMM parameters from refined landmarks. Key contributions include the multi-attribute feature aggregation for landmark refinement, the reverse representation direction, and a self-supervised consistency loss that enhances information flow, yielding state-of-the-art results for facial alignment, face orientation estimation, and 3D face modeling on AFLW and Florence benchmarks. The method relies on simple, fast network blocks to achieve high throughput (≈2600fps for landmarks and ≈2300fps for dense 3D faces) and demonstrates robustness across large pose variations, with supplementary texture synthesis to generate more realistic textures.

Abstract

This work studies learning from a synergy process of 3D Morphable Models (3DMM) and 3D facial landmarks to predict complete 3D facial geometry, including 3D alignment, face orientation, and 3D face modeling. Our synergy process leverages a representation cycle for 3DMM parameters and 3D landmarks. 3D landmarks can be extracted and refined from face meshes built by 3DMM parameters. We next reverse the representation direction and show that predicting 3DMM parameters from sparse 3D landmarks improves the information flow. Together we create a synergy process that utilizes the relation between 3D landmarks and 3DMM parameters, and they collaboratively contribute to better performance. We extensively validate our contribution on full tasks of facial geometry prediction and show our superior and robust performance on these tasks for various scenarios. Particularly, we adopt only simple and widely-used network operations to attain fast and accurate facial geometry prediction. Codes and data: https://choyingw.github.io/works/SynergyNet/

Paper Structure

This paper contains 22 sections, 8 equations, 21 figures, 13 tables.

Figures (21)

  • Figure 1: Results from our SynergyNet with monocular image inputs. Note that 3D landmarks can predict hidden face outlines in 3D rather than follow visible outlines on images.
  • Figure 2: Framework of our SynergyNet. Backbone network learns to regress 3DMM parameters ($\alpha_p$,$\alpha_s$, and $\alpha_e$) and reconstruct 3D face meshes from monocular face images. Multi-Attribute feature aggregation gathers underlying 3DMM semantics and the latent image code to refine landmarks further. The landmark-to-3DMM module regresses 3DMM from refined landmarks $L^r$ to reveal the embedded facial geometry in 3D landmarks. A self-constraining consistency is applied to 3DMM parameters regressed from different sources. This synergy process includes a forward representation direction, from 3DMM parameters to refined 3D landmarks, and a reverse direction, from 3D landmarks to regress 3DMM parameters, to attain better performance. The red and blue arrows after shape and expression (expr) decoders show the main areas of deformation that each 3DMM semantics controls.
  • Figure 3: Structure of multi-attribute landmark refinement. The input is $L^c$ from the foundation face model. The left MLPs extract global point features and fuse the global features with other attributes, including images features, shape, and expression parameters. The concatenation is appended to the low-level features to create multi-attribute point features, which are used to regress the refined landmarks.
  • Figure 4: Illustration of representation cycle.
  • Figure 5: Qualitative comparison of facial alignment and orientation estimation. The case on the left is low-resolution, blurry, and thus challenging. The case on the right is of rare and extreme roll rotation. Our results show more robustness over 3DDFA-V2.
  • ...and 16 more figures