Table of Contents
Fetching ...

Generative Portrait Shadow Removal

Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Xuaner Zhang, Yannick Hold-Geoffroy, Krishna Kumar Singh, He Zhang

TL;DR

This work tackles the ill-posed problem of portrait shadow removal by reframing it as a generative diffusion task that rebuilds shadow-free portraits from scratch, guided by the input image. A compositional repurposing framework fine-tunes a pre-trained text-to-image diffusion model in two stages: light-aware background harmonization and shadow-free portrait generation, ensuring the preserved lighting distribution and natural appearance. A guided upsampling module recovers high-frequency details to maintain identity and texture, while a large-scale dataset built from lightstage, synthetic humans, and augmented real-world data supports robust training. Extensive experiments, ablation studies, and applications demonstrate superior shadow removal quality, identity preservation, and cross-scene generalization, enabling high-fidelity portrait relighting and clean appearance modeling in real-world settings.

Abstract

We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.

Generative Portrait Shadow Removal

TL;DR

This work tackles the ill-posed problem of portrait shadow removal by reframing it as a generative diffusion task that rebuilds shadow-free portraits from scratch, guided by the input image. A compositional repurposing framework fine-tunes a pre-trained text-to-image diffusion model in two stages: light-aware background harmonization and shadow-free portrait generation, ensuring the preserved lighting distribution and natural appearance. A guided upsampling module recovers high-frequency details to maintain identity and texture, while a large-scale dataset built from lightstage, synthetic humans, and augmented real-world data supports robust training. Extensive experiments, ablation studies, and applications demonstrate superior shadow removal quality, identity preservation, and cross-scene generalization, enabling high-fidelity portrait relighting and clean appearance modeling in real-world settings.

Abstract

We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.
Paper Structure (33 sections, 6 equations, 14 figures, 6 tables)

This paper contains 33 sections, 6 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: Motivation. (a) Input image of a portrait under external shadow. (b) Appearance samples on the skin where any of them can be a plausible solution. (c) The results with a residual appearance prediction model (e.g., UNet ronneberger2015u) which is designed to propagate the local residual appearance. This is often trapped by the local minima (e.g., strong shadow boundary). (d) We generatively remove the shadow by globally re-building the shadow-free portrait images from scratch.
  • Figure 2: Overview of our compositional repurposing framework. (a) A denoising diffusion model learns massive images and text pairs to generate an image from a text, which forms a large image prior. We perform a series of repurposing of this prior: (b) The diffusion model learns to generate the harmonized portrait images with respect to the background scenes as a condition of a downsampled background lighting map. (c) The diffusion model is further fine-tuned to generate a shadow-free portrait where the downsampled inputs are used as the lighting map. Here, Enc. and Dec. mean Encoder and Decoder, respectively.
  • Figure 3: Guided upsampling module that combines the low-frequency components of the generation and high-frequency details from the original input.
  • Figure 4: Training dataset for compositional repurposing where the HDR rendering data for background harmonization and portrait shadows by self-occlusion are captured from lightstage, and others are based on graphics simulation.
  • Figure 5: Comparison with other baselines on the validation data.
  • ...and 9 more figures