AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars
Yuda Qiu, Zitong Xiao, Yiwei Zuo, Zisheng Ye, Weikai Chen, Xiaoguang Han
TL;DR
AvatarTex addresses the challenge of reconstructing high-fidelity, topology-consistent facial textures from a single image for both stylized and photorealistic avatars. It introduces a novel three-stage diffusion-to-GAN pipeline that combines diffusion-based inpainting, StyleGAN2 latent optimization, and diffusion repainting, anchored by TexHub, a 20,000-texture multi-style UV dataset generated with LoRA-guided diffusion and ControlNet guidance. This framework achieves state-of-the-art texture topology alignment and detail across diverse artistic styles, validated via quantitative metrics and user studies, and demonstrates robust generalization to in-the-wild images. TexHub will be released to facilitate future research in multi-style facial texture synthesis and avatar creation.
Abstract
We present AvatarTex, a high-fidelity facial texture reconstruction framework capable of generating both stylized and photorealistic textures from a single image. Existing methods struggle with stylized avatars due to the lack of diverse multi-style datasets and challenges in maintaining geometric consistency in non-standard textures. To address these limitations, AvatarTex introduces a novel three-stage diffusion-to-GAN pipeline. Our key insight is that while diffusion models excel at generating diversified textures, they lack explicit UV constraints, whereas GANs provide a well-structured latent space that ensures style and topology consistency. By integrating these strengths, AvatarTex achieves high-quality topology-aligned texture synthesis with both artistic and geometric coherence. Specifically, our three-stage pipeline first completes missing texture regions via diffusion-based inpainting, refines style and structure consistency using GAN-based latent optimization, and enhances fine details through diffusion-based repainting. To address the need for a stylized texture dataset, we introduce TexHub, a high-resolution collection of 20,000 multi-style UV textures with precise UV-aligned layouts. By leveraging TexHub and our structured diffusion-to-GAN pipeline, AvatarTex establishes a new state-of-the-art in multi-style facial texture reconstruction. TexHub will be released upon publication to facilitate future research in this field.
