Table of Contents
Fetching ...

ROGR: Relightable 3D Objects using Generative Relighting

Jiapeng Tang, Matthew Levine, Dor Verbin, Stephan J. Garbin, Matthias Nießner, Ricardo Martin Brualla, Pratul P. Srinivasan, Philipp Henzler

TL;DR

ROGR tackles the challenge of relighting 3D objects by distilling a multi-view generative relighting diffusion into a single, lighting-conditioned NeRF. The method first generates a diverse, view-consistent relit dataset across many environment maps using a multi-view diffusion model, then trains a NeRF-Casting-based model that conditions on both a global environment embedding and a specular cue, enabling fast, forward-rendering under novel illuminations. Key contributions include the dual lighting conditioning scheme, a pipeline that yields consistent multi-view relighting without per-illumination optimization, and strong empirical results on synthetic and real-world benchmarks with interactive speeds. This approach advances immersive object relighting for AR/VR, visual effects, and product visualization by providing realistic, controllable appearance changes under unseen lighting conditions. Potential impact includes enabling more accurate digital insertions and facilitating lighting-aware data augmentation, with acknowledged limitations around complex light-material phenomena and scene-scale extension.

Abstract

We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an object captured from multiple views, driven by a generative relighting model that simulates the effects of placing the object under novel environment illuminations. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural Radiance Field (NeRF) that outputs the object's appearance under any input environmental lighting. The lighting-conditioned NeRF uses a novel dual-branch architecture to encode the general lighting effects and specularities separately. The optimized lighting-conditioned NeRF enables efficient feed-forward relighting under arbitrary environment maps without requiring per-illumination optimization or light transport simulation. We evaluate our approach on the established TensoIR and Stanford-ORB datasets, where it improves upon the state-of-the-art on most metrics, and showcase our approach on real-world object captures.

ROGR: Relightable 3D Objects using Generative Relighting

TL;DR

ROGR tackles the challenge of relighting 3D objects by distilling a multi-view generative relighting diffusion into a single, lighting-conditioned NeRF. The method first generates a diverse, view-consistent relit dataset across many environment maps using a multi-view diffusion model, then trains a NeRF-Casting-based model that conditions on both a global environment embedding and a specular cue, enabling fast, forward-rendering under novel illuminations. Key contributions include the dual lighting conditioning scheme, a pipeline that yields consistent multi-view relighting without per-illumination optimization, and strong empirical results on synthetic and real-world benchmarks with interactive speeds. This approach advances immersive object relighting for AR/VR, visual effects, and product visualization by providing realistic, controllable appearance changes under unseen lighting conditions. Potential impact includes enabling more accurate digital insertions and facilitating lighting-aware data augmentation, with acknowledged limitations around complex light-material phenomena and scene-scale extension.

Abstract

We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an object captured from multiple views, driven by a generative relighting model that simulates the effects of placing the object under novel environment illuminations. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural Radiance Field (NeRF) that outputs the object's appearance under any input environmental lighting. The lighting-conditioned NeRF uses a novel dual-branch architecture to encode the general lighting effects and specularities separately. The optimized lighting-conditioned NeRF enables efficient feed-forward relighting under arbitrary environment maps without requiring per-illumination optimization or light transport simulation. We evaluate our approach on the established TensoIR and Stanford-ORB datasets, where it improves upon the state-of-the-art on most metrics, and showcase our approach on real-world object captures.

Paper Structure

This paper contains 45 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Given a set of posed images under unknown illumination (top), our method reconstructs a relightable neural radiance field (bottom), that can be rendered under any novel environment map without further optimization, on-the-fly relighting and novel view synthesis.
  • Figure 2: Multi-view Relight Diffusion. Our multi-view relighting diffusion model takes as input $N$ posed images illuminated with a consistent, but unknown illumination, represented by camera raymaps and the source pixel values, and an environment map per image that has been rotated to the camera pose. The diffusion model generates images of the same object from the same poses, but lit by an input environment map. To generate our multi-illumination dataset, we repeat this relighting process $M$ times with $M$ environment maps.
  • Figure 3: Lighting conditioning signals. We use a combination of two lighting conditioning signals to train the NeRF on our generated multi-illumination dataset. The general lighting encoding $\mathbf{f}^{\text{general}}$ is used for encoding the full environment map in a single embedding, and is obtained using a transformer encoder applied to the entire sphere of incident radiance. The specular encoding $\mathbf{f}^{\text{specular}}$ is composed of the environment map value, as well as prefiltered versions of the environment map, queried at the reflection direction $\boldsymbol{\omega}_r$, which is the direction of the camera ray reflected about the surface normal vector. Combining these two conditioning signals provides the NeRF with all the information necessary for relighting diffuse materials as well as shiny ones, which exhibit strong reflections.
  • Figure 4: Qualitative comparisons on TensoIR jin2023tensoir. All renderings are rescaled to the image resolution of the ground truth. Compared to previous works, our method recovers more plausible specular highlights and more accurate colors as indicated with the red box.
  • Figure 5: Qualitative comparisons on Stanford-ORB kuang2024stanford. Renderings from all methods are rescaled to the image resolution of the ground truth. Compared to previous work, our method produces high-fidelity renderings with more faithful specular reflections highlighted in the red boxes.
  • ...and 3 more figures