Table of Contents
Fetching ...

Shadow Generation for Composite Image Using Diffusion model

Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, Li Niu

TL;DR

This paper first adapt ControlNet to the authors' task and then proposes intensity modulation modules to improve the shadow intensity, and extends the small-scale DESOBA dataset to DESOBAv2 using a novel data acquisition pipeline.

Abstract

In the realm of image composition, generating realistic shadow for the inserted foreground remains a formidable challenge. Previous works have developed image-to-image translation models which are trained on paired training data. However, they are struggling to generate shadows with accurate shapes and intensities, hindered by data scarcity and inherent task complexity. In this paper, we resort to foundation model with rich prior knowledge of natural shadow images. Specifically, we first adapt ControlNet to our task and then propose intensity modulation modules to improve the shadow intensity. Moreover, we extend the small-scale DESOBA dataset to DESOBAv2 using a novel data acquisition pipeline. Experimental results on both DESOBA and DESOBAv2 datasets as well as real composite images demonstrate the superior capability of our model for shadow generation task. The dataset, code, and model are released at https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2.

Shadow Generation for Composite Image Using Diffusion model

TL;DR

This paper first adapt ControlNet to the authors' task and then proposes intensity modulation modules to improve the shadow intensity, and extends the small-scale DESOBA dataset to DESOBAv2 using a novel data acquisition pipeline.

Abstract

In the realm of image composition, generating realistic shadow for the inserted foreground remains a formidable challenge. Previous works have developed image-to-image translation models which are trained on paired training data. However, they are struggling to generate shadows with accurate shapes and intensities, hindered by data scarcity and inherent task complexity. In this paper, we resort to foundation model with rich prior knowledge of natural shadow images. Specifically, we first adapt ControlNet to our task and then propose intensity modulation modules to improve the shadow intensity. Moreover, we extend the small-scale DESOBA dataset to DESOBAv2 using a novel data acquisition pipeline. Experimental results on both DESOBA and DESOBAv2 datasets as well as real composite images demonstrate the superior capability of our model for shadow generation task. The dataset, code, and model are released at https://github.com/bcmi/Object-Shadow-Generation-Dataset-DESOBAv2.
Paper Structure (22 sections, 6 equations, 5 figures, 2 tables)

This paper contains 22 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: A composite image can be obtained by pasting the foreground on the background. Shadow generation aims to generate plausible shadow for the inserted foreground in the composite image to produce a more realistic image.
  • Figure 2: The pipeline of dataset construction. We use object-shadow detection model detect4 to predict pairs of object and shadow masks in the real image $\bm{I}_r$. Then we obtain the union $\bm{M}_s$ of all shadow masks as the inpainting mask and apply inpainting model Rombach_2022_CVPR to get a deshadowed image $\bm{I}_d$. After designating a foreground object, we replace the background shadow regions $\bm{M}_{bs}$ in $\bm{I}_d$ with the counterparts in $\bm{I}_r$ to synthesize a composite image $\bm{I}_c$, and replace all the shadow regions $\bm{M}_{s}$ in $\bm{I}_d$ with the counterparts in $\bm{I}_r$ to obtain the ground-truth target image $\bm{I}_g$.
  • Figure 3: The framework of our SGDiffusion. We adapt ControlNet (Control Encoder and Denoising U-Net) to shadow generation task. We also introduce an intensity encoder to modulate the foreground shadow region in the noise map $\tilde{\bm{\epsilon}}$, leading to $\hat{\bm{\epsilon}}$. The output noise $\hat{\bm{\epsilon}}$ is supervised by weighted noise loss $\mathcal{L}_{mwsg}$ based on the expanded foreground shadow mask $\hat{\bm{M}}_{fs}$
  • Figure 4: Visual comparison of different methods on DESOBAv2 dataset. From left to right are input composite image (a), foreground object mask (b), results of ShadowGAN zhang2019shadowgan (c), MaskshadowGAN hu2019mask (d), ARShadowGAN liu2020arshadowgan (e), SGRNet hong2021shadow (f), our SGDiffusion (g), ground-truth (h).
  • Figure 5: Visual comparison of different methods on real composite images. From left to right are input composite image (a), foreground object mask (b), results of ShadowGAN zhang2019shadowgan (c), MaskshadowGAN hu2019mask (d), ARShadowGAN liu2020arshadowgan (e), SGRNet hong2021shadow (f), SGDiffusion (g).