Table of Contents
Fetching ...

Learning to Decouple the Lights for 3D Face Texture Modeling

Tianxin Huang, Zhenyu Zhang, Ying Tai, Gim Hee Lee

TL;DR

This work tackles recovering faithful 3D face textures when illumination is distorted by external occlusions. It introduces Light Decoupling, a framework that represents illumination as multiple learnable light conditions predicted by spatial-temporal neural masks, combined via Adaptive Condition Estimation with strong global/local/human priors to enforce realism. The approach outperforms baselines on single images and video sequences across diverse datasets, improving texture clarity and relighting realism under challenging occlusions. By decoupling complex lighting and leveraging perceptual and identity-based priors, the method advances robust texture recovery for realistic digital humans in unconstrained scenes.

Abstract

Existing research has made impressive strides in reconstructing human facial shapes and textures from images with well-illuminated faces and minimal external occlusions. Nevertheless, it remains challenging to recover accurate facial textures from scenarios with complicated illumination affected by external occlusions, e.g. a face that is partially obscured by items such as a hat. Existing works based on the assumption of single and uniform illumination cannot correctly process these data. In this work, we introduce a novel approach to model 3D facial textures under such unnatural illumination. Instead of assuming single illumination, our framework learns to imitate the unnatural illumination as a composition of multiple separate light conditions combined with learned neural representations, named Light Decoupling. According to experiments on both single images and video sequences, we demonstrate the effectiveness of our approach in modeling facial textures under challenging illumination affected by occlusions. Please check https://tianxinhuang.github.io/projects/Deface for our videos and codes.

Learning to Decouple the Lights for 3D Face Texture Modeling

TL;DR

This work tackles recovering faithful 3D face textures when illumination is distorted by external occlusions. It introduces Light Decoupling, a framework that represents illumination as multiple learnable light conditions predicted by spatial-temporal neural masks, combined via Adaptive Condition Estimation with strong global/local/human priors to enforce realism. The approach outperforms baselines on single images and video sequences across diverse datasets, improving texture clarity and relighting realism under challenging occlusions. By decoupling complex lighting and leveraging perceptual and identity-based priors, the method advances robust texture recovery for realistic digital humans in unconstrained scenes.

Abstract

Existing research has made impressive strides in reconstructing human facial shapes and textures from images with well-illuminated faces and minimal external occlusions. Nevertheless, it remains challenging to recover accurate facial textures from scenarios with complicated illumination affected by external occlusions, e.g. a face that is partially obscured by items such as a hat. Existing works based on the assumption of single and uniform illumination cannot correctly process these data. In this work, we introduce a novel approach to model 3D facial textures under such unnatural illumination. Instead of assuming single illumination, our framework learns to imitate the unnatural illumination as a composition of multiple separate light conditions combined with learned neural representations, named Light Decoupling. According to experiments on both single images and video sequences, we demonstrate the effectiveness of our approach in modeling facial textures under challenging illumination affected by occlusions. Please check https://tianxinhuang.github.io/projects/Deface for our videos and codes.

Paper Structure

This paper contains 43 sections, 8 equations, 19 figures, 12 tables, 1 algorithm.

Figures (19)

  • Figure 1: Blue and red rectangles mark regions affected by self and external occlusions, respectively. (a) Texture modeling with diffuse-only texture map. (b) Texture modeling based on diffuse, specular, and roughness albedos from local reflectance model dib2021practical, while optimizing with ray-tracing render. (c) Our method learns neural representations to decouple the original illumination into multiple light conditions, where the influence from external occlusions can be modeled as one of the conditions. White and black regions in the masks denote 1 and 0, respectively.
  • Figure 2: Illustration of our framework. The pipeline is proposed to recover texture $T$ and 3DMM statistical coefficients $\alpha, \beta, \delta, p, \gamma$ from the input image $I_{in}$. The statistical coefficient $\delta$ is used to initialized $T$. Render mask $M_R$ and Faces $I_{Rn}$ under $n$ light conditions $\gamma=\{\gamma_1 \sim \gamma_n\}$ are acquired through ray-tracing rendering. $f(\cdot)$ and $g(\cdot)$ are neural representations predicting light masks $M_N$ and facial region mask $M_o$. ACE is introduced to select effective masks $M_L$ and rendered faces $I_{Rs}$. $I_{Rs}$ are combined into $I_R$ with $M_L$, where $I_R$ is merged with surroundings in $I_{in}$ with $M_o$ to construct output image $I_{out}$. $L_{pho}$ and $L_{lan}$ are photometric loss and landmark loss, respectively.
  • Figure 3: Comparison on Voxceleb2 images. The diffuse albedo is visualized as the texture because it contains most of the color information. Textures from source images are used to synthesize the target images. NextFace* denotes results optimized within regions selected with face parsing lin2019face. We do not have textures for CPEM mo2022towards or D3DFR deng2019accurate as they predict vertex colors instead of uv textures.
  • Figure 4: Comparison results on the CelebAMask-HQ dataset. Ours and Ours+ denote our rendered results $I_R$ directly overlapped onto original images, and results combined with environments: $I_{out}=M_o \odot I_R + (1-M_o) \odot I_{in}$, respectively.
  • Figure 5: Ablation study for losses. GP, LP, and HP denote $L_{GP}$, $L_{LP}$, $L_{HP}$, respectively, while NA means to remove all of them.
  • ...and 14 more figures