StyleMorpheus: A Style-Based 3D-Aware Morphable Face Model
Peizhi Yan, Rabab K. Ward, Dan Wang, Qiang Tang, Shan Du
TL;DR
StyleMorpheus tackles the challenge of building a 3D-aware morphable face model trained on in-the-wild 2D images without explicit 3D geometry. It introduces a style-based auto-encoder that learns disentangled identity $z_{id}$, expression $z_{expr}$, texture $z_{tex}$, and lighting $z_{light}$ codes, mapped to $w$-spaces and decoded through a low-resolution NeRF-inspired, style-conditioned decoder with modulated layers to achieve real-time, photorealistic rendering. The approach combines a two-stage training regime with photometric, perceptual, segmentation, and code regularization losses, plus adversarial learning, and enables single-image reconstruction, style mixing, and facial-part color editing, outperforming prior 3D-aware parametric models in reconstruction quality. Real-time performance is demonstrated on consumer hardware, with applications in VR and interactive editing, while ablation analyses validate the necessity of adversarial training and encoder-based disentanglement. The work advances practical 3D-aware face modeling by removing the need for explicit 3D shapes and lab-collected data, enabling scalable, editable, and fast 3D face synthesis from in-the-wild imagery.
Abstract
For 3D face modeling, the recently developed 3D-aware neural rendering methods are able to render photorealistic face images with arbitrary viewing directions. The training of the parametric controllable 3D-aware face models, however, still relies on a large-scale dataset that is lab-collected. To address this issue, this paper introduces "StyleMorpheus", the first style-based neural 3D Morphable Face Model (3DMM) that is trained on in-the-wild images. It inherits 3DMM's disentangled controllability (over face identity, expression, and appearance) but without the need for accurately reconstructed explicit 3D shapes. StyleMorpheus employs an auto-encoder structure. The encoder aims at learning a representative disentangled parametric code space and the decoder improves the disentanglement using shape and appearance-related style codes in the different sub-modules of the network. Furthermore, we fine-tune the decoder through style-based generative adversarial learning to achieve photorealistic 3D rendering quality. The proposed style-based design enables StyleMorpheus to achieve state-of-the-art 3D-aware face reconstruction results, while also allowing disentangled control of the reconstructed face. Our model achieves real-time rendering speed, allowing its use in virtual reality applications. We also demonstrate the capability of the proposed style-based design in face editing applications such as style mixing and color editing. Project homepage: https://github.com/ubc-3d-vision-lab/StyleMorpheus.
