Table of Contents
Fetching ...

AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars

Yuda Qiu, Zitong Xiao, Yiwei Zuo, Zisheng Ye, Weikai Chen, Xiaoguang Han

TL;DR

AvatarTex addresses the challenge of reconstructing high-fidelity, topology-consistent facial textures from a single image for both stylized and photorealistic avatars. It introduces a novel three-stage diffusion-to-GAN pipeline that combines diffusion-based inpainting, StyleGAN2 latent optimization, and diffusion repainting, anchored by TexHub, a 20,000-texture multi-style UV dataset generated with LoRA-guided diffusion and ControlNet guidance. This framework achieves state-of-the-art texture topology alignment and detail across diverse artistic styles, validated via quantitative metrics and user studies, and demonstrates robust generalization to in-the-wild images. TexHub will be released to facilitate future research in multi-style facial texture synthesis and avatar creation.

Abstract

We present AvatarTex, a high-fidelity facial texture reconstruction framework capable of generating both stylized and photorealistic textures from a single image. Existing methods struggle with stylized avatars due to the lack of diverse multi-style datasets and challenges in maintaining geometric consistency in non-standard textures. To address these limitations, AvatarTex introduces a novel three-stage diffusion-to-GAN pipeline. Our key insight is that while diffusion models excel at generating diversified textures, they lack explicit UV constraints, whereas GANs provide a well-structured latent space that ensures style and topology consistency. By integrating these strengths, AvatarTex achieves high-quality topology-aligned texture synthesis with both artistic and geometric coherence. Specifically, our three-stage pipeline first completes missing texture regions via diffusion-based inpainting, refines style and structure consistency using GAN-based latent optimization, and enhances fine details through diffusion-based repainting. To address the need for a stylized texture dataset, we introduce TexHub, a high-resolution collection of 20,000 multi-style UV textures with precise UV-aligned layouts. By leveraging TexHub and our structured diffusion-to-GAN pipeline, AvatarTex establishes a new state-of-the-art in multi-style facial texture reconstruction. TexHub will be released upon publication to facilitate future research in this field.

AvatarTex: High-Fidelity Facial Texture Reconstruction from Single-Image Stylized Avatars

TL;DR

AvatarTex addresses the challenge of reconstructing high-fidelity, topology-consistent facial textures from a single image for both stylized and photorealistic avatars. It introduces a novel three-stage diffusion-to-GAN pipeline that combines diffusion-based inpainting, StyleGAN2 latent optimization, and diffusion repainting, anchored by TexHub, a 20,000-texture multi-style UV dataset generated with LoRA-guided diffusion and ControlNet guidance. This framework achieves state-of-the-art texture topology alignment and detail across diverse artistic styles, validated via quantitative metrics and user studies, and demonstrates robust generalization to in-the-wild images. TexHub will be released to facilitate future research in multi-style facial texture synthesis and avatar creation.

Abstract

We present AvatarTex, a high-fidelity facial texture reconstruction framework capable of generating both stylized and photorealistic textures from a single image. Existing methods struggle with stylized avatars due to the lack of diverse multi-style datasets and challenges in maintaining geometric consistency in non-standard textures. To address these limitations, AvatarTex introduces a novel three-stage diffusion-to-GAN pipeline. Our key insight is that while diffusion models excel at generating diversified textures, they lack explicit UV constraints, whereas GANs provide a well-structured latent space that ensures style and topology consistency. By integrating these strengths, AvatarTex achieves high-quality topology-aligned texture synthesis with both artistic and geometric coherence. Specifically, our three-stage pipeline first completes missing texture regions via diffusion-based inpainting, refines style and structure consistency using GAN-based latent optimization, and enhances fine details through diffusion-based repainting. To address the need for a stylized texture dataset, we introduce TexHub, a high-resolution collection of 20,000 multi-style UV textures with precise UV-aligned layouts. By leveraging TexHub and our structured diffusion-to-GAN pipeline, AvatarTex establishes a new state-of-the-art in multi-style facial texture reconstruction. TexHub will be released upon publication to facilitate future research in this field.

Paper Structure

This paper contains 25 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Given an input facial image, AvatarTex generates the corresponding high-fidelity and topology-consistent texture with both artistic and geometric coherence. AvatarTex supports the reconstruction from in-the-wild face images across diverse styles.
  • Figure 2: The illustration of our (a) TexHub and (b, c, d) AvatarTex, including (b) texture initialization (c) texture correction and (d) texture enhancement.
  • Figure 3: The visualization of our TexHub. We guide the FLUX diffusion to generate UV facial texture with LoRA trained with limited hand crafted texture data.
  • Figure 4: The visualization of the optimization process. The optimization based on diffusion backbone struggles to capture the accurate shape of the local structure, like brows and lips. Instead, the results based on StyleGAN backbone reconstruct the correct features but fail to achieve high quality. The StyleGAN case corresponds to the optimization process in Fig. \ref{['fig:abl']}b, and the Diffusion case corresponds to that in Fig. \ref{['fig:abl']}e.
  • Figure 5: The visual results of our comparisons. The results are (a) ours (b) pixel2pixelHD inpaintingwang2018pix2pixHD (c) Stable Diffusion 2.1Rombach_2022_CVPR inpainting (d) FFHQ-UVbai2023ffhq (e)UV-IDMli2024uv respectively. More examples can be found in the gallery of the supplementary material.
  • ...and 3 more figures