Table of Contents
Fetching ...

Learning to Stabilize Faces

Jan Bednarik, Erroll Wood, Vasileios Choutas, Timo Bolkart, Daoye Wang, Chenglei Wu, Thabo Beeler

TL;DR

This work treats stabilization as a regression problem: given two face meshes, their network directly predicts the rigid transform between them that brings their skulls into alignment, and shows that the approach outperforms the state‐of‐the‐art both quantitatively and qualitatively on the tasks of stabilizing discrete sets of facial expressions as well as dynamic facial performances.

Abstract

Nowadays, it is possible to scan faces and automatically register them with high quality. However, the resulting face meshes often need further processing: we need to stabilize them to remove unwanted head movement. Stabilization is important for tasks like game development or movie making which require facial expressions to be cleanly separated from rigid head motion. Since manual stabilization is labor-intensive, there have been attempts to automate it. However, previous methods remain impractical: they either still require some manual input, produce imprecise alignments, rely on dubious heuristics and slow optimization, or assume a temporally ordered input. Instead, we present a new learning-based approach that is simple and fully automatic. We treat stabilization as a regression problem: given two face meshes, our network directly predicts the rigid transform between them that brings their skulls into alignment. We generate synthetic training data using a 3D Morphable Model (3DMM), exploiting the fact that 3DMM parameters separate skull motion from facial skin motion. Through extensive experiments we show that our approach outperforms the state-of-the-art both quantitatively and qualitatively on the tasks of stabilizing discrete sets of facial expressions as well as dynamic facial performances. Furthermore, we provide an ablation study detailing the design choices and best practices to help others adopt our approach for their own uses. Supplementary videos can be found on the project webpage syntec-research.github.io/FaceStab.

Learning to Stabilize Faces

TL;DR

This work treats stabilization as a regression problem: given two face meshes, their network directly predicts the rigid transform between them that brings their skulls into alignment, and shows that the approach outperforms the state‐of‐the‐art both quantitatively and qualitatively on the tasks of stabilizing discrete sets of facial expressions as well as dynamic facial performances.

Abstract

Nowadays, it is possible to scan faces and automatically register them with high quality. However, the resulting face meshes often need further processing: we need to stabilize them to remove unwanted head movement. Stabilization is important for tasks like game development or movie making which require facial expressions to be cleanly separated from rigid head motion. Since manual stabilization is labor-intensive, there have been attempts to automate it. However, previous methods remain impractical: they either still require some manual input, produce imprecise alignments, rely on dubious heuristics and slow optimization, or assume a temporally ordered input. Instead, we present a new learning-based approach that is simple and fully automatic. We treat stabilization as a regression problem: given two face meshes, our network directly predicts the rigid transform between them that brings their skulls into alignment. We generate synthetic training data using a 3D Morphable Model (3DMM), exploiting the fact that 3DMM parameters separate skull motion from facial skin motion. Through extensive experiments we show that our approach outperforms the state-of-the-art both quantitatively and qualitatively on the tasks of stabilizing discrete sets of facial expressions as well as dynamic facial performances. Furthermore, we provide an ablation study detailing the design choices and best practices to help others adopt our approach for their own uses. Supplementary videos can be found on the project webpage syntec-research.github.io/FaceStab.

Paper Structure

This paper contains 16 sections, 17 equations, 10 figures, 3 tables, 1 algorithm.

Figures (10)

  • Figure 1: Our dataset includes $38\,360$ face meshes which were registered to multi-view images. These registrations are used for building our 3DMM and sampling random expressions.
  • Figure 2: The architecture of the rigid transformation predictor. The network takes a pair $(V_{s}, V_{t})$ of source and target skin vertices of the same subject on the input and predicts the rotation $R$ and translation $\mathbf{t}$ which best aligns the input pair on the output.
  • Figure 3: Our stabilization neural network is trained with synthetic data only. We synthesize realistic and diverse faces by mixing random identities (\ref{['fig:dataset_samples_identity']}) with random expressions (\ref{['fig:dataset_samples_expression']}).
  • Figure 4: Schematic view of the kinematic chain of our 3DMM (left) and skinning weights corresponding to the joints (right).
  • Figure 5: The stabilization error visualized as an error map in the range $[0, 3]$ mm.
  • ...and 5 more figures