Table of Contents
Fetching ...

Relightful Harmonization: Lighting-aware Portrait Background Replacement

Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang

TL;DR

The paper tackles realistic portrait background replacement by incorporating lighting cues into harmonization. It introduces Relightful Harmonization, a diffusion-based model that conditions on a background's lighting using a dedicated lighting representation, aligns this representation with environment-map lighting, and then finetunes on real imagery via a novel data-synthesis pipeline. The three-stage approach yields improved lighting coherence and photorealism across light-stage, synthetic natural-image, and real-world tests, without requiring HDR maps at inference. This work advances practical portrait compositing by delivering lighting-aware, flexible background replacement for casual photography. It also outlines limitations such as resolution constraints and occasional identity shifts, guiding future refinements.

Abstract

Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image. Our approach unfolds in three stages. First, we introduce a lighting representation module that allows our diffusion model to encode lighting information from target image background. Second, we introduce an alignment network that aligns lighting features learned from image background with lighting features learned from panorama environment maps, which is a complete representation for scene illumination. Last, to further boost the photorealism of the proposed method, we introduce a novel data simulation pipeline that generates synthetic training pairs from a diverse range of natural images, which are used to refine the model. Our method outperforms existing benchmarks in visual fidelity and lighting coherence, showing superior generalization in real-world testing scenarios, highlighting its versatility and practicality.

Relightful Harmonization: Lighting-aware Portrait Background Replacement

TL;DR

The paper tackles realistic portrait background replacement by incorporating lighting cues into harmonization. It introduces Relightful Harmonization, a diffusion-based model that conditions on a background's lighting using a dedicated lighting representation, aligns this representation with environment-map lighting, and then finetunes on real imagery via a novel data-synthesis pipeline. The three-stage approach yields improved lighting coherence and photorealism across light-stage, synthetic natural-image, and real-world tests, without requiring HDR maps at inference. This work advances practical portrait compositing by delivering lighting-aware, flexible background replacement for casual photography. It also outlines limitations such as resolution constraints and occasional identity shifts, guiding future refinements.

Abstract

Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image. Our approach unfolds in three stages. First, we introduce a lighting representation module that allows our diffusion model to encode lighting information from target image background. Second, we introduce an alignment network that aligns lighting features learned from image background with lighting features learned from panorama environment maps, which is a complete representation for scene illumination. Last, to further boost the photorealism of the proposed method, we introduce a novel data simulation pipeline that generates synthetic training pairs from a diverse range of natural images, which are used to refine the model. Our method outperforms existing benchmarks in visual fidelity and lighting coherence, showing superior generalization in real-world testing scenarios, highlighting its versatility and practicality.
Paper Structure (19 sections, 2 equations, 20 figures, 3 tables)

This paper contains 19 sections, 2 equations, 20 figures, 3 tables.

Figures (20)

  • Figure 1: Relightful Harmonization on four real-world images. Each set shows a direct composition (upper left) of the foreground subject onto a new backgound (lower left), and our harmonized result (right) that accounts for both lighting and color.
  • Figure 2: The Pipeline of Relightful Harmonization. In Stage I, a lighting representation module is integrated into the diffusion model, conditioning the generation on lighting information encoded from the background image, trained with a light stage dataset for relighting (lower left). Stage II aligns lighting features derived from the background with the environment map for enhanced physical accuracy. Finally, Stage III refines the model on a real image dataset (lower right) obtained via a novel data simulation pipeline.
  • Figure 3: Data synthesis pipeline. Given a real image, and inpaint the subject region, we obtain a synthetic background. The foreground lighting is then altered with our model trained in Stage I/II, to create an input image with distinct lighting. Two example pairs are shown on the lower right.
  • Figure 4: Visual comparisons with benchmark methods.
  • Figure 5: Real-world testing results under different scenarios to examine the lighting and shadow effects. For each pair of results in row (a)-(c), we display the composite image (left) and the harmonized image (right). In row (d), we omit the composition for better visibility. Full visualization is provided in the Appendix.
  • ...and 15 more figures