High-Fidelity Facial Albedo Estimation via Texture Quantization
Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng
TL;DR
HiFiAlbedo tackles the challenge of high-fidelity facial albedo reconstruction from monocular images without relying on captured albedo data. It builds a high-quality facial texture codebook from large-scale, high-resolution RGB faces and uses a dual-discriminator UV texture reconstruction followed by a cross-attention mechanism with a group identity loss to map textures to unbiased albedo latent representations. The approach supports multi-image inference to mitigate illumination–albedo ambiguity and demonstrates competitive performance on the FAIR benchmark with strong generalization to real-world imagery. This self-supervised pipeline reduces data requirements while enabling realistic rendering and robust albedo recovery across diverse identities.
Abstract
Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo reconstruction model, HiFiAlbedo, which recovers the albedo map directly from a single image without the need for captured albedo data. Our key insight is that the albedo map is the illumination invariant texture map, which enables us to use inexpensive texture data to derive an albedo estimation by eliminating illumination. To achieve this, we first collect large-scale ultra-high-resolution facial images and train a high-fidelity facial texture codebook. By using the FFHQ dataset and limited UV textures, we then fine-tune the encoder for texture reconstruction from the input image with adversarial supervision in both image and UV space. Finally, we train a cross-attention module and utilize group identity loss to learn the adaptation from facial texture to the albedo domain. Extensive experimentation has demonstrated that our method exhibits excellent generalizability and is capable of achieving high-fidelity results for in-the-wild facial albedo recovery. Our code, pre-trained weights, and training data will be made publicly available at https://hifialbedo.github.io/.
