Table of Contents
Fetching ...

LightIt: Illumination Modeling and Control for Diffusion Models

Peter Kocsis, Julien Philip, Kalyan Sunkavalli, Matthias Nießner, Yannick Hold-Geoffroy

TL;DR

This work introduces LightIt, a method for explicit illumination control for image generation, the first that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.

Abstract

We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce shading, which includes cast shadows. We first train a shading estimation module to generate a dataset of real-world images and shading pairs. Then, we train a control network using the estimated shading and normals as input. Our method demonstrates high-quality image generation and lighting control in numerous scenes. Additionally, we use our generated dataset to train an identity-preserving relighting model, conditioned on an image and a target shading. Our method is the first that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.

LightIt: Illumination Modeling and Control for Diffusion Models

TL;DR

This work introduces LightIt, a method for explicit illumination control for image generation, the first that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.

Abstract

We introduce LightIt, a method for explicit illumination control for image generation. Recent generative methods lack lighting control, which is crucial to numerous artistic aspects of image generation such as setting the overall mood or cinematic appearance. To overcome these limitations, we propose to condition the generation on shading and normal maps. We model the lighting with single bounce shading, which includes cast shadows. We first train a shading estimation module to generate a dataset of real-world images and shading pairs. Then, we train a control network using the estimated shading and normals as input. Our method demonstrates high-quality image generation and lighting control in numerous scenes. Additionally, we use our generated dataset to train an identity-preserving relighting model, conditioned on an image and a target shading. Our method is the first that enables the generation of images with controllable, consistent lighting and performs on par with specialized relighting state-of-the-art methods.
Paper Structure (22 sections, 1 equation, 13 figures, 4 tables)

This paper contains 22 sections, 1 equation, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Shading Estimation. We estimate the direct shading of a single image. (i) We predict image features (FeatureNet) and unproject them to a 3D feature grid in NDC space. (ii) We predict a density field from the features (DensityNet). (iii) Given the sun's direction and solid angle, we trace rays toward the lightsource to obtain a coarse shadow map. (iv) Using the shadows and N-dot-L shading information, we predict a coarse shading map (ShadingNet). (v) We refine the shading map to get our direct shading (RefinementNet).
  • Figure 2: Model Overview. To generate lighting-controlled images, we train a light control module similar to zhang2023controlnet, conditioned on normal and shading estimation. We use a custom Residual Control Encoder to encode the control signal for the ControlNet. Adding a Residual Control Decoder with a reconstruction loss ensures the full control signal is present in the encoded signal.
  • Figure 3: Dataset Generation Pipeline. We generate a dataset using the Outdoor Laval dataset hold2019deep. We randomly crop images from the panoramas and automatically predict normal, shading, and caption (\ref{['sec:method:dataset']}). For our relighting experiments (\ref{['sec:method:relighting']}), we extend the dataset with relit images using OutCast griffiths2022outcast.
  • Figure 4: Image Synthesis with Consistent Lighting. Our generated images feature consistent lighting aligned with the target shading for diverse text prompts.
  • Figure 5: In-Domain Image Synthesis with Controllable Lighting. We can synthesize images under various lighting conditions.
  • ...and 8 more figures