Bringing Diversity from Diffusion Models to Semantic-Guided Face Asset Generation
Yunxuan Cai, Sitao Xiang, Zongjian Li, Haiwei Chen, Yajie Zhao
TL;DR
The paper tackles the challenge of creating diverse, semantically controllable 3D face assets with realistic textures suitable for PBR rendering. It introduces a diffusion-driven data generation pipeline to build a large UV-texture/head-geometry dataset, a two-stage GAN-based generator to produce geometry and albedo conditioned on demographic attributes, and a texture-normalization mechanism to convert diffusion-derived textures into clean albedo maps. Asset refinement further adds high-frequency detail, specular/displacement maps, and secondary components, enabling production-ready assets with inversion and editing capabilities. An interactive web interface demonstrates practical usability, and extensive experiments show improved semantic control, texture quality, and faster generation times compared to prior methods. The approach offers a scalable path for diverse synthetic avatar creation in VFX, gaming, and data generation, while acknowledging limitations around geometry diversity, UV completion, and diffusion biases.
Abstract
Digital modeling and reconstruction of human faces serve various applications. However, its availability is often hindered by the requirements of data capturing devices, manual labor, and suitable actors. This situation restricts the diversity, expressiveness, and control over the resulting models. This work aims to demonstrate that a semantically controllable generative network can provide enhanced control over the digital face modeling process. To enhance diversity beyond the limited human faces scanned in a controlled setting, we introduce a novel data generation pipeline that creates a high-quality 3D face database using a pre-trained diffusion model. Our proposed normalization module converts synthesized data from the diffusion model into high-quality scanned data. Using the 44,000 face models we obtained, we further developed an efficient GAN-based generator. This generator accepts semantic attributes as input, and generates geometry and albedo. It also allows continuous post-editing of attributes in the latent space. Our asset refinement component subsequently creates physically-based facial assets. We introduce a comprehensive system designed for creating and editing high-quality face assets. Our proposed model has undergone extensive experiment, comparison and evaluation. We also integrate everything into a web-based interactive tool. We aim to make this tool publicly available with the release of the paper.
