3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

Guohao Li; Hongyu Yang; Di Huang; Yunhong Wang

3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

Guohao Li, Hongyu Yang, Di Huang, Yunhong Wang

TL;DR

This work tackles disentangling identity and expression in 3D face modeling under weak supervision. It introduces WSDF, a two-branch VAE with an identity-consistency prior and a Neutral Bank to generate pseudo-neutral scans, coupled with a label-free second-order loss and a tensor-based Re-coupler to robustly separate factors. The method enables training across multiple datasets to improve generalization without requiring dense expression labels. Experimental results on FaceScape, CoMA, and D3DFACS demonstrate superior reconstruction, disentanglement, and neutralization, with notable gains when training on combined datasets. This approach broadens practical deployment of controllable 3D face models in cross-domain scenarios.

Abstract

Generative 3D face models featuring disentangled controlling factors hold immense potential for diverse applications in computer vision and computer graphics. However, previous 3D face modeling methods face a challenge as they demand specific labels to effectively disentangle these factors. This becomes particularly problematic when integrating multiple 3D face datasets to improve the generalization of the model. Addressing this issue, this paper introduces a Weakly-Supervised Disentanglement Framework, denoted as WSDF, to facilitate the training of controllable 3D face models without an overly stringent labeling requirement. Adhering to the paradigm of Variational Autoencoders (VAEs), the proposed model achieves disentanglement of identity and expression controlling factors through a two-branch encoder equipped with dedicated identity-consistency prior. It then faithfully re-entangles these factors via a tensor-based combination mechanism. Notably, the introduction of the Neutral Bank allows precise acquisition of subject-specific information using only identity labels, thereby averting degeneration due to insufficient supervision. Additionally, the framework incorporates a label-free second-order loss function for the expression factor to regulate deformation space and eliminate extraneous information, resulting in enhanced disentanglement. Extensive experiments have been conducted to substantiate the superior performance of WSDF. Our code is available at https://github.com/liguohao96/WSDF.

3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

TL;DR

Abstract

Paper Structure (15 sections, 13 equations, 7 figures, 4 tables)

This paper contains 15 sections, 13 equations, 7 figures, 4 tables.

INTRODUCTION
RELATED WORK
METHOD
Framework
Loss Functions
EXPERIMENT
Datasets
Implementation Details
Evaluation Metrics
Quantitative Comparison
Qualitative Evaluation
Ablation Study
Application
CONCLUSIONS
ACKNOWLEDGMENTS

Figures (7)

Figure 1: Method overview. (a) Identity-aware sampling constructs a group of training samples belonging to the same ID. Two encoders, $\mathcal{E}_{id}$ and $\mathcal{E}_{exp}$ are built to disentangle identity and expression representations. The disentangled latent codes, $z^{ id}$ and $z^{ exp}$, are re-coupled by fusion module $\mathcal{R}$ and fed into the generator $\mathcal{G}$ for decoding. Simultaneously, a neutral bank is constructed to obtain pseudo-neutral scans for each subject, imposing disentanglement through the $\mathcal{L}_{neu}$ loss. (b) The encoders $\mathcal{E}_{id}$ and $\mathcal{E}_{id}$ utilize the Spiral++ architecture with the same setup, aiming to factor out the inductive bias of model design. (c) The decoder involves a tensor-based Recoupler $\mathcal{R}$ with non-linearity and an MLP-based $\mathcal{G}$ for mapping from decoupled latent codes $z^{ id}$ and $z^{ exp}$ to 3D faces.
Figure 2: Qualitative comparison on FaceScape. Zoom in for a better view.
Figure 3: Interpolation results achieved on FaceScape.
Figure 4: Neutralization of unseen scans on FaceScape. Each row indicates one individual.
Figure 5: Visualization of Neutral Bank on FaceScape.
...and 2 more figures

3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

TL;DR

Abstract

3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

Authors

TL;DR

Abstract

Table of Contents

Figures (7)