Table of Contents
Fetching ...

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Xi Chen, Sida Peng, Dongchen Yang, Yuan Liu, Bowen Pan, Chengfei Lv, Xiaowei Zhou

TL;DR

This work tackles inverse rendering under unknown static illumination by introducing conditional diffusion priors for albedo and specular shading, designed to regularize the inherent ambiguity between material and lighting. The authors implement a two-stage coarse-to-fine optimization: first obtaining a rough material and lighting estimate from diffusion priors, then guiding diffusion samples to achieve multi-view consistency across views. By separating diffuse and specular components and training priors on large 3D-object datasets, the method achieves state-of-the-art material recovery and relighting on synthetic and real data, with good generalization to internet images. The approach offers a practical, data-driven way to resolve one of inverse rendering’s fundamental ambiguities and provides a scalable framework for material and lighting estimation under unknown illumination.

Abstract

This paper aims to recover object materials from posed images captured under an unknown static lighting condition. Recent methods solve this task by optimizing material parameters through differentiable physically based rendering. However, due to the coupling between object geometry, materials, and environment lighting, there is inherent ambiguity during the inverse rendering process, preventing previous methods from obtaining accurate results. To overcome this ill-posed problem, our key idea is to learn the material prior with a generative model for regularizing the optimization process. We observe that the general rendering equation can be split into diffuse and specular shading terms, and thus formulate the material prior as diffusion models of albedo and specular. Thanks to this design, our model can be trained using the existing abundant 3D object data, and naturally acts as a versatile tool to resolve the ambiguity when recovering material representations from RGB images. In addition, we develop a coarse-to-fine training strategy that leverages estimated materials to guide diffusion models to satisfy multi-view consistent constraints, leading to more stable and accurate results. Extensive experiments on real-world and synthetic datasets demonstrate that our approach achieves state-of-the-art performance on material recovery. The code will be available at https://zju3dv.github.io/IntrinsicAnything.

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

TL;DR

This work tackles inverse rendering under unknown static illumination by introducing conditional diffusion priors for albedo and specular shading, designed to regularize the inherent ambiguity between material and lighting. The authors implement a two-stage coarse-to-fine optimization: first obtaining a rough material and lighting estimate from diffusion priors, then guiding diffusion samples to achieve multi-view consistency across views. By separating diffuse and specular components and training priors on large 3D-object datasets, the method achieves state-of-the-art material recovery and relighting on synthetic and real data, with good generalization to internet images. The approach offers a practical, data-driven way to resolve one of inverse rendering’s fundamental ambiguities and provides a scalable framework for material and lighting estimation under unknown illumination.

Abstract

This paper aims to recover object materials from posed images captured under an unknown static lighting condition. Recent methods solve this task by optimizing material parameters through differentiable physically based rendering. However, due to the coupling between object geometry, materials, and environment lighting, there is inherent ambiguity during the inverse rendering process, preventing previous methods from obtaining accurate results. To overcome this ill-posed problem, our key idea is to learn the material prior with a generative model for regularizing the optimization process. We observe that the general rendering equation can be split into diffuse and specular shading terms, and thus formulate the material prior as diffusion models of albedo and specular. Thanks to this design, our model can be trained using the existing abundant 3D object data, and naturally acts as a versatile tool to resolve the ambiguity when recovering material representations from RGB images. In addition, we develop a coarse-to-fine training strategy that leverages estimated materials to guide diffusion models to satisfy multi-view consistent constraints, leading to more stable and accurate results. Extensive experiments on real-world and synthetic datasets demonstrate that our approach achieves state-of-the-art performance on material recovery. The code will be available at https://zju3dv.github.io/IntrinsicAnything.
Paper Structure (29 sections, 7 equations, 11 figures, 1 table)

This paper contains 29 sections, 7 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Two types of ambiguities in inverse rendering. (a) Ambiguity between diffuse shading and albedo. For example, the Xbox is lit by a yellow light, and the decomposed albedo from NVdiffrecMC 2022nvdiffrecmc tends to be yellow. (b) Ambiguity between shadow and albedo. For example, the porcelain toy is with self-occlusion, and TensoIR Jin2023TensoIR bakes the shadow into the recovered albedo. Our method well handles the two types of ambiguities.
  • Figure 2: Single-view intrinsic images decomposition results. Compared with the objects-level method Yi et al.yi2023weaklysupervised as well as scene-level methods IIR zhu2022learning and IID kocsis2023iid, our approach recovers more accurate and detailed intrinsic images and demonstrates strong generalization capabilities across various objects and scenes.
  • Figure 3: Overview of our pipeline. (a) Based on physically based rendering, our model combines lighting, geometry, roughness, and albedo into RGB and specular images, and optimizes the lighting and materials in a two-stage manner. In the first stage, our model is supervised by images and diffusion priors to output coarse albedo and roughness. Subsequently, the coarse materials are used to guide diffusion models to provide more multi-view consistent constraints. (b) The guided sampling first calculates the L2 loss between the guidance and one-step denoised signals, and then adds the gradient of the L1 loss to the output of the noise predictor.
  • Figure 4: Effect of the guided sampling. We visualize the samples generated from the albedo prior model of different viewpoints. Without the guided sampling, the materials are inconsistent across multi-views and do not align with the material decomposition from the observed lighting.
  • Figure 5: Qualitative comparison in terms of relighting on the synthetic dataset. Zoom in for details.
  • ...and 6 more figures