Table of Contents
Fetching ...

Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance

Yuto Enyo, Ko Nishino

TL;DR

The paper tackles the ill-posed task of recovering both illumination and material reflectance from a single image by formulating radiometric image formation as a stochastic forward process on a geometry-invariant reflectance map. It introduces DRMNet, a diffusion-based framework with IllNet and RefNet that iteratively reverse this process to produce a reflectance map corresponding to a perfect mirror while estimating the object's reflectance; ObsNet completes the input image to feed the diffusion model. Training on a large synthetic dataset, the method demonstrates state-of-the-art accuracy on synthetic and real datasets, enabling realistic relighting and object replacement under arbitrary reflectance. The approach eliminates the need for differentiable rendering by leveraging a principled stochastic inversion, opening practical applications in embodied perception and scene understanding with single-view inputs.

Abstract

Reflectance bounds the frequency spectrum of illumination in the object appearance. In this paper, we introduce the first stochastic inverse rendering method, which recovers the attenuated frequency spectrum of an illumination jointly with the reflectance of an object of known geometry from a single image. Our key idea is to solve this blind inverse problem in the reflectance map, an appearance representation invariant to the underlying geometry, by learning to reverse the image formation with a novel diffusion model which we refer to as the Diffusion Reflectance Map Network (DRMNet). Given an observed reflectance map converted and completed from the single input image, DRMNet generates a reflectance map corresponding to a perfect mirror sphere while jointly estimating the reflectance. The forward process can be understood as gradually filtering a natural illumination with lower and lower frequency reflectance and additive Gaussian noise. DRMNet learns to invert this process with two subnetworks, IllNet and RefNet, which work in concert towards this joint estimation. The network is trained on an extensive synthetic dataset and is demonstrated to generalize to real images, showing state-of-the-art accuracy on established datasets.

Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance

TL;DR

The paper tackles the ill-posed task of recovering both illumination and material reflectance from a single image by formulating radiometric image formation as a stochastic forward process on a geometry-invariant reflectance map. It introduces DRMNet, a diffusion-based framework with IllNet and RefNet that iteratively reverse this process to produce a reflectance map corresponding to a perfect mirror while estimating the object's reflectance; ObsNet completes the input image to feed the diffusion model. Training on a large synthetic dataset, the method demonstrates state-of-the-art accuracy on synthetic and real datasets, enabling realistic relighting and object replacement under arbitrary reflectance. The approach eliminates the need for differentiable rendering by leveraging a principled stochastic inversion, opening practical applications in embodied perception and scene understanding with single-view inputs.

Abstract

Reflectance bounds the frequency spectrum of illumination in the object appearance. In this paper, we introduce the first stochastic inverse rendering method, which recovers the attenuated frequency spectrum of an illumination jointly with the reflectance of an object of known geometry from a single image. Our key idea is to solve this blind inverse problem in the reflectance map, an appearance representation invariant to the underlying geometry, by learning to reverse the image formation with a novel diffusion model which we refer to as the Diffusion Reflectance Map Network (DRMNet). Given an observed reflectance map converted and completed from the single input image, DRMNet generates a reflectance map corresponding to a perfect mirror sphere while jointly estimating the reflectance. The forward process can be understood as gradually filtering a natural illumination with lower and lower frequency reflectance and additive Gaussian noise. DRMNet learns to invert this process with two subnetworks, IllNet and RefNet, which work in concert towards this joint estimation. The network is trained on an extensive synthetic dataset and is demonstrated to generalize to real images, showing state-of-the-art accuracy on established datasets.
Paper Structure (30 sections, 27 equations, 13 figures, 5 tables)

This paper contains 30 sections, 27 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: We introduce the first single-image stochastic inverse rendering method, a principled approach for recovering the attenuated frequency spectrum of the illumination and reflectance by seamlessly integrating a neural generative process in inverse rendering. Our key idea is to recover the illumination as a reflectance map of a perfect mirror. The results enable arbitrary object insertion and relighting.
  • Figure 2: Overall architecture of Diffusion Reflectance Map Network (DRMNet). DRMNet consists of two subnetworks, IllNet for stochastic reverse diffusion to recursively transform the observed reflectance map into a reflectance map of a perfect mirror, and RefNet for jointly and iteratively estimating the reflectance.
  • Figure 3: Additive Gaussian observation noise makes each image formation process stochastic, and reflectance attenuates high-frequencies of the illumination whose residues are gradually washed out by the noise. The reverse process is thus necessarily stochastic, effectively generating the lost higher frequency components of the illumination by sampling along learned scores.
  • Figure 4: We convert the single input image into an observation reflectance map by "completing" the sparsely mapped reflectance map with another diffusion model, ObsNet.
  • Figure 5: Qualitative results on iBRDF synthetic dataset. For each input, the top row is the illumination estimate shown as a spherical panorama and the bottom row is the reflectance estimate rendered as a sphere under a point source.
  • ...and 8 more figures