Table of Contents
Fetching ...

Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation

Hadi Alzayer, Philipp Henzler, Jonathan T. Barron, Jia-Bin Huang, Pratul P. Srinivasan, Dor Verbin

TL;DR

The paper tackles 3D reconstruction from images captured under extreme illumination variation, including strong view-dependent specularities. It introduces a diffusion-based multiview relighting model that jointly relights all input views to a single reference illumination, followed by a NeRF-Casting–style radiance field optimization augmented with per-image shading embeddings to absorb residual relighting errors. This combination yields superior reconstruction quality and faithful specular highlights on both synthetic and real datasets, without requiring environment maps. The approach broadens the applicability of neural radiance fields to unconstrained image collections and demonstrates robust handling of illumination-induced ambiguities in 3D appearance.

Abstract

Reconstructing the geometry and appearance of objects from photographs taken in different environments is difficult as the illumination and therefore the object appearance vary across captured images. This is particularly challenging for more specular objects whose appearance strongly depends on the viewing direction. Some prior approaches model appearance variation across images using a per-image embedding vector, while others use physically-based rendering to recover the materials and per-image illumination. Such approaches fail at faithfully recovering view-dependent appearance given significant variation in input illumination and tend to produce mostly diffuse results. We present an approach that reconstructs objects from images taken under different illuminations by first relighting the images under a single reference illumination with a multiview relighting diffusion model and then reconstructing the object's geometry and appearance with a radiance field architecture that is robust to the small remaining inconsistencies among the relit images. We validate our proposed approach on both synthetic and real datasets and demonstrate that it greatly outperforms existing techniques at reconstructing high-fidelity appearance from images taken under extreme illumination variation. Moreover, our approach is particularly effective at recovering view-dependent "shiny" appearance which cannot be reconstructed by prior methods.

Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation

TL;DR

The paper tackles 3D reconstruction from images captured under extreme illumination variation, including strong view-dependent specularities. It introduces a diffusion-based multiview relighting model that jointly relights all input views to a single reference illumination, followed by a NeRF-Casting–style radiance field optimization augmented with per-image shading embeddings to absorb residual relighting errors. This combination yields superior reconstruction quality and faithful specular highlights on both synthetic and real datasets, without requiring environment maps. The approach broadens the applicability of neural radiance fields to unconstrained image collections and demonstrates robust handling of illumination-induced ambiguities in 3D appearance.

Abstract

Reconstructing the geometry and appearance of objects from photographs taken in different environments is difficult as the illumination and therefore the object appearance vary across captured images. This is particularly challenging for more specular objects whose appearance strongly depends on the viewing direction. Some prior approaches model appearance variation across images using a per-image embedding vector, while others use physically-based rendering to recover the materials and per-image illumination. Such approaches fail at faithfully recovering view-dependent appearance given significant variation in input illumination and tend to produce mostly diffuse results. We present an approach that reconstructs objects from images taken under different illuminations by first relighting the images under a single reference illumination with a multiview relighting diffusion model and then reconstructing the object's geometry and appearance with a radiance field architecture that is robust to the small remaining inconsistencies among the relit images. We validate our proposed approach on both synthetic and real datasets and demonstrate that it greatly outperforms existing techniques at reconstructing high-fidelity appearance from images taken under extreme illumination variation. Moreover, our approach is particularly effective at recovering view-dependent "shiny" appearance which cannot be reconstructed by prior methods.

Paper Structure

This paper contains 15 sections, 5 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: 3D reconstruction under extreme illumination variation. We propose a method for 3D reconstruction from a set of images captured under strongly varying illumination. Our method recovers high-fidelity appearance details including specular highlights that prior state-of-the-art approaches cannot recover (top baseline: NeRF-Casting verbin2024nerf with appearance embeddings, bottom baseline: NeROIC neroic2022kuang).
  • Figure 2: Method overview. We first apply a relighting diffusion model that converts $N$ images $I_1$, ..., $I_N$ with known camera poses $\pi_1$, ..., $\pi_N$, captured under extremely different illuminations, to a set of images with the same poses, but rendered under the illumination of the reference image $I_1$ (highlighted in orange). We then optimize a neural radiance field to obtain a consistent 3D representation with a novel per-image shading embedding, which can be used to render new views of the scene from unobserved poses.
  • Figure 3: A comparison of our multiview relighting with prior work on single-image relighting. Our method first relights a set of inconsistently-lit images (one of which is shown in (a)) to match the illumination of a selected reference image (b) in that set. Single-image relighting techniques such as IllumiNeRF zhao2024illuminerf (c) struggle to disambiguate geometry, lighting, and materials, leading to an inaccurate relighting. In contrast, our model jointly relights a set of inconsistently-lit frames, which reduces ambiguities and results in a significantly more accurate result (d) when compared with the ground truth (e).
  • Figure 4: Visual comparison of novel view renderings on the Objaverse dataset. (b) We show sample input images under extreme illumination variation. (c) Adding a per-image latent code to NeRF-Casting verbin2024nerf ("NeRFCast + AE") cannot accurately explain away the variations, leading to erroneous reconstruction. (d) Due to the ill-posed nature of the problem, inverse rendering-based methods such as NeROIC neroic2022kuang tend to produce lower-quality renderings with mostly diffuse appearance. (e) IllumiNeRF zhao2024illuminerf leverages diffusion prior for single-image relighting but produces inconsistent output samples, resulting in excessive blur in rendered novel views. Note that IllumiNeRF requires access to the target illumination's environment map as input. We provide IllumiNeRF with the ground truth environment map corresponding to the reference image (a), while other methods only have access to the reference image itself. (f) Our method renders accurate appearance with specular highlights close to those in the ground truth images (g).
  • Figure 5: Comparison on real world photos. We use our method to reconstruct objects from in-the-wild photos taken in different environments. Our method can render novel views under the illumination conditions of any input image we select as the reference. Unlike prior work, our technique accurately preserves shadows (e.g. the bunny's ear shadow in the first row) and reflections (e.g. the box's handle in the second row and the car's specularities in the last row) that appear in the reference image we would like to match.
  • ...and 2 more figures