Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Yuto Enyo, Ko Nishino
TL;DR
The paper tackles the ill-posed task of recovering both illumination and material reflectance from a single image by formulating radiometric image formation as a stochastic forward process on a geometry-invariant reflectance map. It introduces DRMNet, a diffusion-based framework with IllNet and RefNet that iteratively reverse this process to produce a reflectance map corresponding to a perfect mirror while estimating the object's reflectance; ObsNet completes the input image to feed the diffusion model. Training on a large synthetic dataset, the method demonstrates state-of-the-art accuracy on synthetic and real datasets, enabling realistic relighting and object replacement under arbitrary reflectance. The approach eliminates the need for differentiable rendering by leveraging a principled stochastic inversion, opening practical applications in embodied perception and scene understanding with single-view inputs.
Abstract
Reflectance bounds the frequency spectrum of illumination in the object appearance. In this paper, we introduce the first stochastic inverse rendering method, which recovers the attenuated frequency spectrum of an illumination jointly with the reflectance of an object of known geometry from a single image. Our key idea is to solve this blind inverse problem in the reflectance map, an appearance representation invariant to the underlying geometry, by learning to reverse the image formation with a novel diffusion model which we refer to as the Diffusion Reflectance Map Network (DRMNet). Given an observed reflectance map converted and completed from the single input image, DRMNet generates a reflectance map corresponding to a perfect mirror sphere while jointly estimating the reflectance. The forward process can be understood as gradually filtering a natural illumination with lower and lower frequency reflectance and additive Gaussian noise. DRMNet learns to invert this process with two subnetworks, IllNet and RefNet, which work in concert towards this joint estimation. The network is trained on an extensive synthetic dataset and is demonstrated to generalize to real images, showing state-of-the-art accuracy on established datasets.
