Table of Contents
Fetching ...

Neural Gaffer: Relighting Any Object via Diffusion

Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, Noah Snavely

TL;DR

Single-image relighting for arbitrary objects is ill-posed due to geometry, materials, and lighting interactions. Neural Gaffer fine-tunes a pretrained image-conditioned latent diffusion model on a large synthetic RelitObjaverse dataset to perform lighting-conditioned relighting via rotated LDR and normalized HDR environment maps without explicit scene decomposition, and extends to 3D radiance-field relighting through a two-stage diffusion-guided pipeline. The key contributions include a category-agnostic 2D relighting diffusion model, the RelitObjaverse dataset, and a practical two-stage 3D relighting workflow that leverages diffusion priors. Collectively, the approach improves generalization and fidelity on both synthetic and real data, enabling robust 2D editing and faster, higher-quality 3D relighting.

Abstract

Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BRDFs, which can be inaccurate or under-expressive. In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. Our method builds on a pre-trained diffusion model, and fine-tunes it on a synthetic relighting dataset, revealing and harnessing the inherent understanding of lighting present in the diffusion model. We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy. Moreover, by combining with other generative methods, our model enables many downstream 2D tasks, such as text-based relighting and object insertion. Our model can also operate as a strong relighting prior for 3D tasks, such as relighting a radiance field.

Neural Gaffer: Relighting Any Object via Diffusion

TL;DR

Single-image relighting for arbitrary objects is ill-posed due to geometry, materials, and lighting interactions. Neural Gaffer fine-tunes a pretrained image-conditioned latent diffusion model on a large synthetic RelitObjaverse dataset to perform lighting-conditioned relighting via rotated LDR and normalized HDR environment maps without explicit scene decomposition, and extends to 3D radiance-field relighting through a two-stage diffusion-guided pipeline. The key contributions include a category-agnostic 2D relighting diffusion model, the RelitObjaverse dataset, and a practical two-stage 3D relighting workflow that leverages diffusion priors. Collectively, the approach improves generalization and fidelity on both synthetic and real data, enabling robust 2D editing and faster, higher-quality 3D relighting.

Abstract

Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BRDFs, which can be inaccurate or under-expressive. In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. Our method builds on a pre-trained diffusion model, and fine-tunes it on a synthetic relighting dataset, revealing and harnessing the inherent understanding of lighting present in the diffusion model. We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy. Moreover, by combining with other generative methods, our model enables many downstream 2D tasks, such as text-based relighting and object insertion. Our model can also operate as a strong relighting prior for 3D tasks, such as relighting a radiance field.
Paper Structure (17 sections, 4 equations, 13 figures, 3 tables)

This paper contains 17 sections, 4 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Single-image relighting results on real data. Neural Gaffer supports single-image relighting for various input images under diverse lighting conditions, using either image-conditioned input (i.e., an environment map) or text-conditioned input (i.e., a description of the target lighting). These results demonstrate our model's capability to adapt to diverse lighting scenarios while preserving the visual fidelity of the original objects. Our relighting results remain consistent with the lighting rotating. Please see the supplementary webpage for additional video results produced from real input images.
  • Figure 2: Model architecture. Neural Gaffer is an img2img latent diffusion model conditioned on the input image and rotated lighting maps.
  • Figure 3: Relighting a 3D neural radiance field. Given an input NeRF and a target environmental lighting, in Stage 1, we use Neural Gaffer to predict relit images at each predefined camera viewpoint. We then tune the appearance field to overfit the multi-view relighting predictions with a reconstruction loss. In Stage 2, we further refine the appearance of the coarsely relit radiance field via the diffusion guidance loss. Using this pipeline, we can relight a NeRF model in minutes.
  • Figure 4: Single-image relighting comparison with DiLightNet zeng2024dilightnet under diverse lighting. Our method demonstrates superior fidelity to the target lighting, maintains more consistent color and detail, and can generate more accurate highlights, shadows, and high-frequent reflections.
  • Figure 5: Object insertion. Our diffusion model can be applied to object insertion. Compared with AnyDoor chen2024anydoor, our method better preserves the identity of the inserted object and achieves higher-quality results.
  • ...and 8 more figures