FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy
Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori
TL;DR
FreeUV tackles the problem of generating high-quality 3D facial UV textures from a single 2D image without ground-truth UV data. It introduces a dual-network architecture that separates appearance (in-the-wild realism) and structure (3DMM-based geometry) and fuses them at inference through Cross-Assembly inside a pre-trained diffusion model, aided by CLIP and ControlNet conditioning. The approach demonstrates superior texture fidelity, robustness to occlusions and makeup, and enables practical applications such as local editing, feature interpolation, and multi-view texture recovery, while requiring substantially less annotated data. This data-efficient framework advances realistic UV texture reconstruction in real-world scenarios by leveraging stable diffusion with targeted structure guidance and appearance refinement. The findings indicate strong potential for scalable, high-fidelity facial texture generation in graphics and vision applications.
Abstract
Recovering high-quality 3D facial textures from single-view 2D images is a challenging task, especially under constraints of limited data and complex facial details such as makeup, wrinkles, and occlusions. In this paper, we introduce FreeUV, a novel ground-truth-free UV texture recovery framework that eliminates the need for annotated or synthetic UV data. FreeUV leverages pre-trained stable diffusion model alongside a Cross-Assembly inference strategy to fulfill this objective. In FreeUV, separate networks are trained independently to focus on realistic appearance and structural consistency, and these networks are combined during inference to generate coherent textures. Our approach accurately captures intricate facial features and demonstrates robust performance across diverse poses and occlusions. Extensive experiments validate FreeUV's effectiveness, with results surpassing state-of-the-art methods in both quantitative and qualitative metrics. Additionally, FreeUV enables new applications, including local editing, facial feature interpolation, and multi-view texture recovery. By reducing data requirements, FreeUV offers a scalable solution for generating high-fidelity 3D facial textures suitable for real-world scenarios.
