Table of Contents
Fetching ...

IllumiNeRF: 3D Relighting Without Inverse Rendering

Xiaoming Zhao, Pratul P. Srinivasan, Dor Verbin, Keunhong Park, Ricardo Martin Brualla, Philipp Henzler

TL;DR

This work first relight each input image using an image diffusion model conditioned on target environment lighting and estimated object geometry, and reconstructs a Neural Radiance Field with these relit images, from which they render novel views under the target lighting.

Abstract

Existing methods for relightable view synthesis -- using a set of images of an object under unknown lighting to recover a 3D representation that can be rendered from novel viewpoints under a target illumination -- are based on inverse rendering, and attempt to disentangle the object geometry, materials, and lighting that explain the input images. Furthermore, this typically involves optimization through differentiable Monte Carlo rendering, which is brittle and computationally-expensive. In this work, we propose a simpler approach: we first relight each input image using an image diffusion model conditioned on target environment lighting and estimated object geometry. We then reconstruct a Neural Radiance Field (NeRF) with these relit images, from which we render novel views under the target lighting. We demonstrate that this strategy is surprisingly competitive and achieves state-of-the-art results on multiple relighting benchmarks. Please see our project page at https://illuminerf.github.io/.

IllumiNeRF: 3D Relighting Without Inverse Rendering

TL;DR

This work first relight each input image using an image diffusion model conditioned on target environment lighting and estimated object geometry, and reconstructs a Neural Radiance Field with these relit images, from which they render novel views under the target lighting.

Abstract

Existing methods for relightable view synthesis -- using a set of images of an object under unknown lighting to recover a 3D representation that can be rendered from novel viewpoints under a target illumination -- are based on inverse rendering, and attempt to disentangle the object geometry, materials, and lighting that explain the input images. Furthermore, this typically involves optimization through differentiable Monte Carlo rendering, which is brittle and computationally-expensive. In this work, we propose a simpler approach: we first relight each input image using an image diffusion model conditioned on target environment lighting and estimated object geometry. We then reconstruct a Neural Radiance Field (NeRF) with these relit images, from which we render novel views under the target lighting. We demonstrate that this strategy is surprisingly competitive and achieves state-of-the-art results on multiple relighting benchmarks. Please see our project page at https://illuminerf.github.io/.
Paper Structure (29 sections, 9 equations, 12 figures, 5 tables)

This paper contains 29 sections, 9 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Given a set of posed input images under an unknown lighting (four exemplar images from the set are shown on top), IllumiNeRF produces high-quality novel views (bottom) relit under a target lighting (illustrated as chrome balls). Inputs obtained from the Stanford-ORB dataset kuang2024stanford.
  • Figure 2: Overview. Given a set of images $I$ and camera poses $\pi$ in (a), we run NeRF to extract the 3D geometry as in (b). Based on this geometry and a target light shown in (c), we create radiance cues for each given input view as in (d). Next, we independently relight each input image using a single-image Relighting Diffusion Model illustrated in (e) and sample $S$ possible solutions for each given view displayed in (f). Finally, we distill the relit set of images into a 3D representation through a Latent NeRF optimization as in (g) and (h).
  • Figure 3: Relit samples vs. latent NeRF. (a) Samples of the Relighting Diffusion Model (Sec. \ref{['sec:diffusion']}) for the same target environment map, and (b) renderings from the optimized Latent NeRF (Sec. \ref{['sec:nerf']}) for a fixed value of the latent. The diffusion samples correspond to different latent explanations of the scene and our latent NeRF optimization is able to effectively optimize these latent variables along with the NeRF model's parameters to produce consistent renderings for each latent explanation.
  • Figure 4: Example radiance cues for a view of the 'hotdog' scene.
  • Figure 5: Qualitative results on TensoIR. Renderings from all approaches have been rescaled with respect to the ground-truth as mentioned in Eq. \ref{['sec:exp setup']}. Unlike TensoIR, our method faithfully recovers specular highlights and colors as indicated in red.
  • ...and 7 more figures