Table of Contents
Fetching ...

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy, Jingyuan Liu, Julie Dorsey, Zhixin Shu

TL;DR

SynthLight addresses portrait relighting by learning to re-render synthetic 3D head renders under new environment maps with a diffusion model. It bridges the synthetic-real gap via multitask training with real images and an inference-time guidance scheme that preserves facial details, achieving realistic lighting effects without requiring real relighting labels. Quantitative results on synthetic and Light Stage data, supplemented by a user study, show competitive or superior performance to state-of-the-art methods, with strong generalization to in-the-wild portraits. This work demonstrates the viability of synthetic data and diffusion-based re-rendering for high-quality portrait relighting, enabling complex illumination effects and practical applicability beyond controlled capture setups.

Abstract

We introduce SynthLight, a diffusion model for portrait relighting. Our approach frames image relighting as a re-rendering problem, where pixels are transformed in response to changes in environmental lighting conditions. Using a physically-based rendering engine, we synthesize a dataset to simulate this lighting-conditioned transformation with 3D head assets under varying lighting. We propose two training and inference strategies to bridge the gap between the synthetic and real image domains: (1) multi-task training that takes advantage of real human portraits without lighting labels; (2) an inference time diffusion sampling procedure based on classifier-free guidance that leverages the input portrait to better preserve details. Our method generalizes to diverse real photographs and produces realistic illumination effects, including specular highlights and cast shadows, while preserving the subject's identity. Our quantitative experiments on Light Stage data demonstrate results comparable to state-of-the-art relighting methods. Our qualitative results on in-the-wild images showcase rich and unprecedented illumination effects. Project Page: \url{https://vrroom.github.io/synthlight/}

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

TL;DR

SynthLight addresses portrait relighting by learning to re-render synthetic 3D head renders under new environment maps with a diffusion model. It bridges the synthetic-real gap via multitask training with real images and an inference-time guidance scheme that preserves facial details, achieving realistic lighting effects without requiring real relighting labels. Quantitative results on synthetic and Light Stage data, supplemented by a user study, show competitive or superior performance to state-of-the-art methods, with strong generalization to in-the-wild portraits. This work demonstrates the viability of synthetic data and diffusion-based re-rendering for high-quality portrait relighting, enabling complex illumination effects and practical applicability beyond controlled capture setups.

Abstract

We introduce SynthLight, a diffusion model for portrait relighting. Our approach frames image relighting as a re-rendering problem, where pixels are transformed in response to changes in environmental lighting conditions. Using a physically-based rendering engine, we synthesize a dataset to simulate this lighting-conditioned transformation with 3D head assets under varying lighting. We propose two training and inference strategies to bridge the gap between the synthetic and real image domains: (1) multi-task training that takes advantage of real human portraits without lighting labels; (2) an inference time diffusion sampling procedure based on classifier-free guidance that leverages the input portrait to better preserve details. Our method generalizes to diverse real photographs and produces realistic illumination effects, including specular highlights and cast shadows, while preserving the subject's identity. Our quantitative experiments on Light Stage data demonstrate results comparable to state-of-the-art relighting methods. Our qualitative results on in-the-wild images showcase rich and unprecedented illumination effects. Project Page: \url{https://vrroom.github.io/synthlight/}
Paper Structure (37 sections, 2 equations, 25 figures, 4 tables)

This paper contains 37 sections, 2 equations, 25 figures, 4 tables.

Figures (25)

  • Figure 1: SynthLight performs relighting on portraits using an environment map lighting. By learning to re-render synthetic human faces, our diffusion model produces realistic illumination effects on real portrait photographs, including distinct cast shadows on the neck and natural specular highlights on the skin. Despite being trained exclusively on synthetic headshot images for relighting, the model demonstrates remarkable generalization to diverse scenarios, successfully handling half-body portraits and even full-body figurines.
  • Figure 2: Synthetic Faces: Subjects are rendered under various lighting conditions (details in \ref{['sec:data']}). We show two examples, where each pair consists of a subject rendered using two different environment maps. The network is trained to re-render synthetic faces by transforming a subject rendered with one environment map into its counterpart rendered with the other environment map.
  • Figure 3: Training pipeline of SynthLight. We first enable the relighting modeling by training the diffusion backbone with synthetic relighting tuples (Task 1, top row), detailed in Sec. \ref{['sec:model']}. To further alleviate the domain gap between synthetic and real image domain, we include a joint training of the text-to-image task (Task 2, bottom row), detailed in Sec. \ref{['sec:train']}. Our model is based on LDM rombach2021highresolution and is composed of a VAE and a UNet. For simplicity, VAE is omitted in the diagram.
  • Figure 4: We employ the image-conditioning classifier-free guidance during inference to proportionally balance between identity preservation, and relighting effects. The final score estimate is computed as per \ref{['eq:cfg']}.
  • Figure 5: Effect of input portrait guidance parameter$\lambda_I$: We show (a) the input portrait, (b) the lighting condition and a reference image rendered in Blender with the same lighting, and (c) outputs with varying $\lambda_I$. (d) highlights that $\lambda_I=1$, equivalent to removing inference-time adaptation, alters the eye shape (in red rectangle). (e) shows that higher $\lambda_I$ introduces undesired lighting artifacts, such as shadow artifacts from the input portrait (in yellow rectangle).
  • ...and 20 more figures