Table of Contents
Fetching ...

Deep Inverse Shading: Consistent Albedo and Surface Detail Recovery via Generative Refinement

Jiacheng Wu, Ruiqi Zhang, Jie Chen

TL;DR

DIS presents a mesh-based framework for relightable avatar reconstruction that integrates generative priors through a normal conversion module and a de-shading module within a differentiable PBR loop, enabling joint optimization of geometry and appearance from sparse views. By projecting 2D normal predictions into 3D surface offsets on a SMPL-based mesh via differentiable rasterization, DIS achieves fine surface detail without large vertex counts and improves material disentanglement with inverse shading priors. Empirical results show state-of-the-art relighting quality, lower memory usage, and higher rendering speed compared to both volumetric and existing surface-based baselines. The approach advances practical, scalable relightable avatar synthesis with improved geometry fidelity and physically plausible appearance.

Abstract

Reconstructing human avatars using generative priors is essential for achieving versatile and realistic avatar models. Traditional approaches often rely on volumetric representations guided by generative models, but these methods require extensive volumetric rendering queries, leading to slow training. Alternatively, surface-based representations offer faster optimization through differentiable rasterization, yet they are typically limited by vertex count, restricting mesh resolution and scalability when combined with generative priors. Moreover, integrating generative priors into physically based human avatar modeling remains largely unexplored. To address these challenges, we introduce DIS (Deep Inverse Shading), a unified framework for high-fidelity, relightable avatar reconstruction that incorporates generative priors into a coherent surface representation. DIS centers on a mesh-based model that serves as the target for optimizing both surface and material details. The framework fuses multi-view 2D generative surface normal predictions, rich in detail but often inconsistent, into the central mesh using a normal conversion module. This module converts generative normal outputs into per-triangle surface offsets via differentiable rasterization, enabling the capture of fine geometric details beyond sparse vertex limitations. Additionally, DIS integrates a de-shading module to recover accurate material properties. This module refines albedo predictions by removing baked-in shading and back-propagates reconstruction errors to optimize the geometry. Through joint optimization of geometry and material appearance, DIS achieves physically consistent, high-quality reconstructions suitable for accurate relighting. Our experiments show that DIS delivers SOTA relighting quality, enhanced rendering efficiency, lower memory consumption, and detailed surface reconstruction.

Deep Inverse Shading: Consistent Albedo and Surface Detail Recovery via Generative Refinement

TL;DR

DIS presents a mesh-based framework for relightable avatar reconstruction that integrates generative priors through a normal conversion module and a de-shading module within a differentiable PBR loop, enabling joint optimization of geometry and appearance from sparse views. By projecting 2D normal predictions into 3D surface offsets on a SMPL-based mesh via differentiable rasterization, DIS achieves fine surface detail without large vertex counts and improves material disentanglement with inverse shading priors. Empirical results show state-of-the-art relighting quality, lower memory usage, and higher rendering speed compared to both volumetric and existing surface-based baselines. The approach advances practical, scalable relightable avatar synthesis with improved geometry fidelity and physically plausible appearance.

Abstract

Reconstructing human avatars using generative priors is essential for achieving versatile and realistic avatar models. Traditional approaches often rely on volumetric representations guided by generative models, but these methods require extensive volumetric rendering queries, leading to slow training. Alternatively, surface-based representations offer faster optimization through differentiable rasterization, yet they are typically limited by vertex count, restricting mesh resolution and scalability when combined with generative priors. Moreover, integrating generative priors into physically based human avatar modeling remains largely unexplored. To address these challenges, we introduce DIS (Deep Inverse Shading), a unified framework for high-fidelity, relightable avatar reconstruction that incorporates generative priors into a coherent surface representation. DIS centers on a mesh-based model that serves as the target for optimizing both surface and material details. The framework fuses multi-view 2D generative surface normal predictions, rich in detail but often inconsistent, into the central mesh using a normal conversion module. This module converts generative normal outputs into per-triangle surface offsets via differentiable rasterization, enabling the capture of fine geometric details beyond sparse vertex limitations. Additionally, DIS integrates a de-shading module to recover accurate material properties. This module refines albedo predictions by removing baked-in shading and back-propagates reconstruction errors to optimize the geometry. Through joint optimization of geometry and material appearance, DIS achieves physically consistent, high-quality reconstructions suitable for accurate relighting. Our experiments show that DIS delivers SOTA relighting quality, enhanced rendering efficiency, lower memory consumption, and detailed surface reconstruction.

Paper Structure

This paper contains 19 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: System diagram: (a) Prior-driven mesh optimization. The pipeline starts by estimating vertex offsets with the surface offset network $\mathcal{M}_\text{offset}$ to deform a coarse mesh $G_\text{coarse}$. Differentiable rasterization provides pixel UV coordinates, which are converted to surface normals $N_\text{surf}$ via the normal conversion module $O2N$. The color network $\mathcal{M}_\text{color}$ predicts pixel color $c$ and image $I_\text{RGB}$, while the normal enhancement model $\mathcal{M}_\text{enhance}$ refines $N_\text{surf}$ to enhanced normals $N_\text{enhance}$, producing a detailed dynamic mesh $G_\text{refined}$. (b) Surface-based physically-based rendering. Learnable light probes $\{L_i\}_i$ are placed around $G_\text{refined}$ to estimate visibility. With rasterized surface attributes, we compute albedo $\alpha_s$ via $\mathcal{M}_{\text{alb}}$, roughness $\gamma_s$ via $\mathcal{M}_\text{rgh}$ (both conditioned on shared features from $\mathcal{M}_\text{color}$), and $N_\text{surf}$. These are used in BRDF-based rendering to synthesize the image $I_\text{PBR}$. (c) Inverse shading and refinement. To remove baked-in lighting from $\alpha_s$, a de-shading module estimates a clean albedo $\hat{\alpha}_s$ conditioned on $\alpha_s$ and $N_\text{surf}$, and computes an improved relit image $\hat{I}_\text{PBR}$. (d) Joint optimization. Gradients are propagated through the entire pipeline: (1) $\mathcal{M}_\text{color}$ is supervised by $I_\text{RGB}$; (2) $\mathcal{M}_\text{offset}$ and $G_\text{coarse}$ by $N_\text{enhance}$; (3) $\mathcal{M}_{\text{alb}}$, $\mathcal{M}_\text{rgh}$, and $\{L_i\}_i$ by $I_\text{PBR}$; (4) the same modules plus $G_\text{refined}$ by $\hat{I}_\text{PBR}$; and (5) $\mathcal{M}_{\text{alb}}$ is further refined using $\hat{\alpha}_s$.
  • Figure 2: Normal Conversion Module. This module first queries rasterized pixel-level information, including pixel UV coordinates, normals, and world coordinates. Pixel offsets, predicted by an offset MLP, are integrated with the pixel data to compute 3D surface coordinates via Eq. (\ref{['eq: surfWorldCoord']}), capturing fine-grained surface variations within each triangle. Finally, point normals are calculated by constructing four surrounding triangles, as described in Eq. (\ref{['eq: normalConversion']}).
  • Figure 3: Qualitative comparison on SyntheticHuman++ across DIS, Relighting4D, and RA.
  • Figure 4: Qualitative comparison on People Snapshot (real-captured outdoor humans) between DIS and Relighting4D.
  • Figure 5: Qualitative comparison on MobileStage (real-captured indoor human) between DIS and RA.
  • ...and 3 more figures