Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon, Zhixin Shu, Jianming Zhang, HyunJoon Jung, Guido Gerig, He Zhang
TL;DR
The paper tackles realistic portrait background replacement by incorporating lighting cues into harmonization. It introduces Relightful Harmonization, a diffusion-based model that conditions on a background's lighting using a dedicated lighting representation, aligns this representation with environment-map lighting, and then finetunes on real imagery via a novel data-synthesis pipeline. The three-stage approach yields improved lighting coherence and photorealism across light-stage, synthetic natural-image, and real-world tests, without requiring HDR maps at inference. This work advances practical portrait compositing by delivering lighting-aware, flexible background replacement for casual photography. It also outlines limitations such as resolution constraints and occasional identity shifts, guiding future refinements.
Abstract
Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image. Our approach unfolds in three stages. First, we introduce a lighting representation module that allows our diffusion model to encode lighting information from target image background. Second, we introduce an alignment network that aligns lighting features learned from image background with lighting features learned from panorama environment maps, which is a complete representation for scene illumination. Last, to further boost the photorealism of the proposed method, we introduce a novel data simulation pipeline that generates synthetic training pairs from a diverse range of natural images, which are used to refine the model. Our method outperforms existing benchmarks in visual fidelity and lighting coherence, showing superior generalization in real-world testing scenarios, highlighting its versatility and practicality.
