Table of Contents
Fetching ...

Monocular Identity-Conditioned Facial Reflectance Reconstruction

Xingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang

TL;DR

The paper tackles monocular facial reflectance reconstruction under limited data by learning a multi-domain, image-space reflectance prior using VQGAN codebooks that align RGB and reflectance domains. An AdaIN-based identity-swapping module injects target identity into a pre-trained decoder, enabling high-fidelity, identity-preserving reflectance outputs across multiple domains. During inference, the method synthesizes multi-view reflectance images from templates and stitches them into UV-space maps for rendering, achieving state-of-the-art results and robust generalization to in-the-wild faces. The approach reduces reliance on costly light-stage captures and supports scalable, realistic avatar creation with plausible relighting and rendering capabilities.

Abstract

Recent 3D face reconstruction methods have made remarkable advancements, yet there remain huge challenges in monocular high-quality facial reflectance reconstruction. Existing methods rely on a large amount of light-stage captured data to learn facial reflectance models. However, the lack of subject diversity poses challenges in achieving good generalization and widespread applicability. In this paper, we learn the reflectance prior in image space rather than UV space and present a framework named ID2Reflectance. Our framework can directly estimate the reflectance maps of a single image while using limited reflectance data for training. Our key insight is that reflectance data shares facial structures with RGB faces, which enables obtaining expressive facial prior from inexpensive RGB data thus reducing the dependency on reflectance data. We first learn a high-quality prior for facial reflectance. Specifically, we pretrain multi-domain facial feature codebooks and design a codebook fusion method to align the reflectance and RGB domains. Then, we propose an identity-conditioned swapping module that injects facial identity from the target image into the pre-trained autoencoder to modify the identity of the source reflectance image. Finally, we stitch multi-view swapped reflectance images to obtain renderable assets. Extensive experiments demonstrate that our method exhibits excellent generalization capability and achieves state-of-the-art facial reflectance reconstruction results for in-the-wild faces. Our project page is https://xingyuren.github.io/id2reflectance/.

Monocular Identity-Conditioned Facial Reflectance Reconstruction

TL;DR

The paper tackles monocular facial reflectance reconstruction under limited data by learning a multi-domain, image-space reflectance prior using VQGAN codebooks that align RGB and reflectance domains. An AdaIN-based identity-swapping module injects target identity into a pre-trained decoder, enabling high-fidelity, identity-preserving reflectance outputs across multiple domains. During inference, the method synthesizes multi-view reflectance images from templates and stitches them into UV-space maps for rendering, achieving state-of-the-art results and robust generalization to in-the-wild faces. The approach reduces reliance on costly light-stage captures and supports scalable, realistic avatar creation with plausible relighting and rendering capabilities.

Abstract

Recent 3D face reconstruction methods have made remarkable advancements, yet there remain huge challenges in monocular high-quality facial reflectance reconstruction. Existing methods rely on a large amount of light-stage captured data to learn facial reflectance models. However, the lack of subject diversity poses challenges in achieving good generalization and widespread applicability. In this paper, we learn the reflectance prior in image space rather than UV space and present a framework named ID2Reflectance. Our framework can directly estimate the reflectance maps of a single image while using limited reflectance data for training. Our key insight is that reflectance data shares facial structures with RGB faces, which enables obtaining expressive facial prior from inexpensive RGB data thus reducing the dependency on reflectance data. We first learn a high-quality prior for facial reflectance. Specifically, we pretrain multi-domain facial feature codebooks and design a codebook fusion method to align the reflectance and RGB domains. Then, we propose an identity-conditioned swapping module that injects facial identity from the target image into the pre-trained autoencoder to modify the identity of the source reflectance image. Finally, we stitch multi-view swapped reflectance images to obtain renderable assets. Extensive experiments demonstrate that our method exhibits excellent generalization capability and achieves state-of-the-art facial reflectance reconstruction results for in-the-wild faces. Our project page is https://xingyuren.github.io/id2reflectance/.
Paper Structure (19 sections, 8 equations, 21 figures, 5 tables)

This paper contains 19 sections, 8 equations, 21 figures, 5 tables.

Figures (21)

  • Figure 1: Overview of the proposed method. Our core insight is to build a facial reflectance prior in image space by using limited captures and to recover the reflectance maps for any unconstrained face. We first train multi-domain facial codebooks using a large amount of RGB data and limited reflectance data. Then, given an input unconstrained face, we extract the identity feature from the pre-trained ArcFace deng2019arcface model. This feature is fed into the swapper module, which guides the decoder to perform identity injection for all domains. We finally stitch three-view identity-conditioned reflectance images to acquire high-quality rendering assets and renderable 3D faces.
  • Figure 2: Visualization of codebook fusion weights. Our method uses multiple basis codebooks (especially the RGB texture codebook) for discrete representations, indicating the cross-domain correlation learned by our model.
  • Figure 3: Detailed architecture of our swapper module. The yellow boxes represent the original multi-scale features from the decoder $\mathcal{G}$, and the green boxes represent the residual features generated by each identity injection branch. We use the small-scale feature map as input to generate identity-conditioned residual features, which will be added to the up-sampled feature map.
  • Figure 4: Comparison of diffuse and specular albedo reconstruction on Digital Emily project. From left to right: input image, Dib et al. dib2021towards, AvatarMe++ lattas2021avatarme++, Relightify papantoniou2023relightify, ReflectanceMM han2023ReflectanceMM, ours and ground-truth.
  • Figure 5: Comparison with recent single image reflectance prediction methods. From left to right: input image, AlbedoMM smith2020AlbedoMM, AvatarMe Lattas20, Dib et al. dib2021towards, FitMe lattas2023fitme, ReflectanceMM han2023ReflectanceMM and ours.
  • ...and 16 more figures