Towards Native Generative Model for 3D Head Avatar
Yiyu Zhuang, Yuxiao He, Jiawei Zhang, Yanwen Wang, Jiahe Zhu, Yao Yao, Siyu Zhu, Xun Cao, Hao Zhu
TL;DR
This work tackles the challenge of producing native, 360$^\circ$ renderable 3D head avatars from limited, high-quality 3D data by exploring three complementary representations (volume-based NeRF, hex-plane hybrid, and point-based Gaussian splats) and by disentangling appearance, shape, and expression in a semantically constrained parametric space $(\alpha,\beta,\varepsilon)$. A new SynHead100 dataset and tailored single-image fitting, animation, and text-based editing pipelines enable random 3D head generation, full-view rendering, and editable motion while preserving identity. Across extensive experiments, the authors demonstrate state-of-the-art 3D geometry accuracy and rendering quality, with the point-based approach offering the best fidelity and efficiency and the hybrid approach providing robust animation control. The work advances practical 360$^\circ$ head synthesis from limited data, with potential impact on metaverse, film, and digital avatar applications, while acknowledging data costs and the need for lighting/material disentangling for further realism.
Abstract
Creating 3D head avatars is a significant yet challenging task for many applicated scenarios. Previous studies have set out to learn 3D human head generative models using massive 2D image data. Although these models are highly generalizable for human appearance, their result models are not 360$^\circ$-renderable, and the predicted 3D geometry is unreliable. Therefore, such results cannot be used in VR, game modeling, and other scenarios that require 360$^\circ$-renderable 3D head models. An intuitive idea is that 3D head models with limited amount but high 3D accuracy are more reliable training data for a high-quality 3D generative model. In this vein, we delve into how to learn a native generative model for 360$^\circ$ full head from a limited 3D head dataset. Specifically, three major problems are studied: 1) how to effectively utilize various representations for generating the 360$^\circ$-renderable human head; 2) how to disentangle the appearance, shape, and motion of human faces to generate a 3D head model that can be edited by appearance and driven by motion; 3) and how to extend the generalization capability of the generative model to support downstream tasks. Comprehensive experiments are conducted to verify the effectiveness of the proposed model. We hope the proposed models and artist-designed dataset can inspire future research on learning native generative 3D head models from limited 3D datasets.
