Table of Contents
Fetching ...

Reflection Generation for Composite Image Using Diffusion Model

Haonan Zhao, Qingyang Liu, Jiaxuan Chen, Li Niu

Abstract

Image composition involves inserting a foreground object into the background while synthesizing environment-consistent effects such as shadows and reflections. Although shadow generation has been extensively studied, reflection generation remains largely underexplored. In this work, we focus on reflection generation. We inject the prior information of reflection placement and reflection appearance into foundation diffusion model. We also divide reflections into two types and adopt type-aware model design. To support training, we construct the first large-scale object reflection dataset DEROBA. Experiments demonstrate that our method generates reflections that are physically coherent and visually realistic, establishing a new benchmark for reflection generation.

Reflection Generation for Composite Image Using Diffusion Model

Abstract

Image composition involves inserting a foreground object into the background while synthesizing environment-consistent effects such as shadows and reflections. Although shadow generation has been extensively studied, reflection generation remains largely underexplored. In this work, we focus on reflection generation. We inject the prior information of reflection placement and reflection appearance into foundation diffusion model. We also divide reflections into two types and adopt type-aware model design. To support training, we construct the first large-scale object reflection dataset DEROBA. Experiments demonstrate that our method generates reflections that are physically coherent and visually realistic, establishing a new benchmark for reflection generation.

Paper Structure

This paper contains 19 sections, 4 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: A composite image can be obtained by placing a foreground object onto the background. Reflection generation seeks to synthesize plausible reflection for the inserted object, thereby enhancing the overall realism and perceptual consistency of the composite image.
  • Figure 2: The pipeline of DEROBA dataset construction. Foreground objects are automatically detected and segmented, with their associated reflection masks manually annotated. Inpainting is performed on the source image $I_{src}$ within both foreground and reflection regions to generate a background image $I_{b}$. To correct color discrepancies, inpainting is again performed on $I_{src}$ with a black mask, leading to the ground-truth image $I_{g}$. Finally, $I_{b}$ and $I_{g}$ are combined to get the composite image $I_{c}$. The details can be found in Section \ref{['sec:dataset']}.
  • Figure 3: The illustration of our method. Besides denoising U-Net and ControlNet encoder $E_{c}$, we introduce an auxiliary encoder $E_{b}$ to predict the bounding box regression coefficients $\tilde{l}_r$ and reflection type $\tilde{t}_r$. According to the predicted reflection type, we extract the corresponding reference features and use the corresponding reflection type embedding.
  • Figure 4: Visual comparison of different methods on DEROBA dataset. From left to right are input composite image, foreground mask, results of SD-ControlNet, FLUX-ControlNet, ICEdit, FLUX-Kontext, Qwen-Edit, our method, and ground-truth.
  • Figure 5: Multiple results for one test image on DEROBA test set. From left to right are input composite image (a), foreground object mask (b), results of our method using different random seeds (c)-(g) and ground-truth (h).
  • ...and 1 more figures