Shadow Generation with Decomposed Mask Prediction and Attentive Shadow Filling
Xinhao Tao, Junyan Cao, Yan Hong, Li Niu
TL;DR
This work tackles the realism gap in image composition by generating plausible foreground shadows for inserted objects. It introduces a large-scale rendered RdSOBA dataset to augment limited real-data sources and a two-stage DMASNet architecture that first predicts a decomposed shadow mask (box and shape) and then fills the shadow with attention to background shadow pixels. The approach demonstrates superior visual realism and strong cross-domain transfer to real composite images, outperforming baselines on multiple metrics and in human studies. The combined dataset and method offer practical improvements for realistic image editing and synthesis in applications requiring coherent shadows across diverse scenes.
Abstract
Image composition refers to inserting a foreground object into a background image to obtain a composite image. In this work, we focus on generating plausible shadows for the inserted foreground object to make the composite image more realistic. To supplement the existing small-scale dataset, we create a large-scale dataset called RdSOBA with rendering techniques. Moreover, we design a two-stage network named DMASNet with decomposed mask prediction and attentive shadow filling. Specifically, in the first stage, we decompose shadow mask prediction into box prediction and shape prediction. In the second stage, we attend to reference background shadow pixels to fill the foreground shadow. Abundant experiments prove that our DMASNet achieves better visual effects and generalizes well to real composite images.
