Table of Contents
Fetching ...

Latent Intrinsics Emerge from Training to Relight

Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David A. Forsyth, Anand Bhattad

TL;DR

The paper tackles image relighting by learning latent intrinsic and extrinsic representations directly from data, avoiding explicit physical models. It proposes a fully data-driven autoencoder that encodes intrinsic scene properties $S^l_{s,i}$ and lighting $L^l_s$ from paired images and decodes to relit images, with a constrained-scaling fusion to prevent leakage. The key findings are that albedo-like maps emerge from latent intrinsics without supervision, relighting achieves state-of-the-art results on real scenes, and the model generalizes to zero-shot relighting and to StyleGAN-generated images. This approach reduces reliance on detailed geometry and surface models and offers a flexible, scalable pathway for relighting and intrinsic estimation in diverse scenes.

Abstract

Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrinsics. This paper describes a relighting method that is entirely data-driven, where intrinsics and lighting are each represented as latent variables. Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We show that albedo can be recovered from our latent intrinsics without using any example albedos, and that the albedos recovered are competitive with SOTA methods.

Latent Intrinsics Emerge from Training to Relight

TL;DR

The paper tackles image relighting by learning latent intrinsic and extrinsic representations directly from data, avoiding explicit physical models. It proposes a fully data-driven autoencoder that encodes intrinsic scene properties and lighting from paired images and decodes to relit images, with a constrained-scaling fusion to prevent leakage. The key findings are that albedo-like maps emerge from latent intrinsics without supervision, relighting achieves state-of-the-art results on real scenes, and the model generalizes to zero-shot relighting and to StyleGAN-generated images. This approach reduces reliance on detailed geometry and surface models and offers a flexible, scalable pathway for relighting and intrinsic estimation in diverse scenes.

Abstract

Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrinsics. This paper describes a relighting method that is entirely data-driven, where intrinsics and lighting are each represented as latent variables. Our approach produces SOTA relightings of real scenes, as measured by standard metrics. We show that albedo can be recovered from our latent intrinsics without using any example albedos, and that the albedos recovered are competitive with SOTA methods.
Paper Structure (12 sections, 10 equations, 9 figures, 4 tables)

This paper contains 12 sections, 10 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: We describe a purely data-driven image relighting model. Our model recovers latent variables representing scene intrinsic properties from one image, latent variables representing lighting from another, then applies the lighting to the intrinsics to produce a relighted scene ( top row). There is no physical model of intrinsics, extrinsics or their interaction. Our model relights images of real scenes with SOTA accuracy and is more accurate than current supervised methods. Note how, for the chrome ball detail in top center, the specular reflections on the chrome ball (which give an approximate environment map) change when the extrinsics are changed. Note how our model ascribes lighting to visible luminaires when it can ( top right), despite the absence of any physical model. A physical model accounts only for effects in that model, and most physical models of surfaces are approximate; in contrast, a latent intrinsic model accounts for whatever produces substantial effects in training data. Latent intrinsics yield albedo in a natural fashion (light the scene with an appropriate illuminant). Bottom row shows SOTA albedo estimates recovered from our latent intrinsics.
  • Figure 2: The network diagram of our relighting model. The model functions as an autoencoder, comprising an encoder ${\bm{E}}$ and a decoder ${\bm{D}}$. Left Half: The encoder ${\bm{E}}$ maps input image ${\bm{I}}_s^l$, captured under scene $s$ and lighting $l$, to low-dimensional extrinsic features ${\bm{L}}_s^l$ and set of intrinsic features map $\{S_{s,i}^l\}_i$. The decoder ${\bm{D}}$ then generates new images based on these intrinsic and extrinsic representations. Right Half: We employ constrained scaling for the injection of ${\bm{L}}_{s}^l$, utilizing $0<\alpha\ll 1$ to regularize the information passed from ${\bm{L}}_s^l$, thereby enforcing a low-dimensional parameterization of the extrinsic features. We train our system to relight target images given input paired with images captured under the same scene $s$. During inference, our model demonstrates the ability to generalize to arbitrary reference images for relighting and can estimate albedo for free.
  • Figure 3: Our method outperforms all other approaches in estimating light and rendering the scene. The Unsupervised SA-AE hu2020sa method fails by incorporating intrinsic elements from reference images. The S3Net yang2021s3net approach struggles with rendering when using unpaired reference images. Right: A zoomed-in view of the chrome ball was used as a probe to evaluate detail preservation in the environment map. Our method effectively retains the intricate room layout and accurately renders the appropriate lighting effects.
  • Figure 4: Latent extrinsics can be interpolated successfully; leftmost and rightmost columns are images from the multi-illumination dataset, and intermediate images are obtained by linear interpolation on the latent extrinsics (light-dependent representations), then decoding. Note how the light seems to "move" across space.
  • Figure 5: Qualitative results for relighting interior scenes using our relighter trained on images obtained from StyLitGAN (which produces multiple illuminations of a generated scene). StyLitGAN has a strong tendency to increase or decrease illumination by adjusting luminaires, typically bedside lights but also light coming through French windows, etc. On the left, where the reference lighting tends to be brighter and more concentrated, notice how for the two top images, our relighter has identified and "turned up" the bedside lights; for the third, it has resisted StyLitGAN's tendency to invent helpful luminaires (there isn't a bedside light where StyLitGAN imputed one, as close inspection shows). On the right, where the reference lighting is much more uniform, our relighter has achieved this by "turning down" bedside lights. This is an emergent phenomenon; the method is not supplied with any explicit luminaire model or labeled data.
  • ...and 4 more figures