Table of Contents
Fetching ...

Generative Image Layer Decomposition with Visual Effects

Jinrui Yang, Qing Liu, Yijun Li, Soo Ye Kim, Daniil Pakhomov, Mengwei Ren, Jianming Zhang, Zhe Lin, Cihang Xie, Yuyin Zhou

TL;DR

LayerDecomp is proposed, a generative framework for image layer decomposition which outputs photorealistic clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects and proposes a consistency loss which enforces the model to learn accurate representations for the transparent foreground layer when ground-truth annotations are not available.

Abstract

Recent advancements in large generative models, particularly diffusion-based methods, have significantly enhanced the capabilities of image editing. However, achieving precise control over image composition tasks remains a challenge. Layered representations, which allow for independent editing of image components, are essential for user-driven content creation, yet existing approaches often struggle to decompose image into plausible layers with accurately retained transparent visual effects such as shadows and reflections. We propose $\textbf{LayerDecomp}$, a generative framework for image layer decomposition which outputs photorealistic clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects. To enable effective training, we first introduce a dataset preparation pipeline that automatically scales up simulated multi-layer data with synthesized visual effects. To further enhance real-world applicability, we supplement this simulated dataset with camera-captured images containing natural visual effects. Additionally, we propose a consistency loss which enforces the model to learn accurate representations for the transparent foreground layer when ground-truth annotations are not available. Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks across several benchmarks and multiple user studies, unlocking various creative possibilities for layer-wise image editing. The project page is https://rayjryang.github.io/LayerDecomp.

Generative Image Layer Decomposition with Visual Effects

TL;DR

LayerDecomp is proposed, a generative framework for image layer decomposition which outputs photorealistic clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects and proposes a consistency loss which enforces the model to learn accurate representations for the transparent foreground layer when ground-truth annotations are not available.

Abstract

Recent advancements in large generative models, particularly diffusion-based methods, have significantly enhanced the capabilities of image editing. However, achieving precise control over image composition tasks remains a challenge. Layered representations, which allow for independent editing of image components, are essential for user-driven content creation, yet existing approaches often struggle to decompose image into plausible layers with accurately retained transparent visual effects such as shadows and reflections. We propose , a generative framework for image layer decomposition which outputs photorealistic clean backgrounds and high-quality transparent foregrounds with faithfully preserved visual effects. To enable effective training, we first introduce a dataset preparation pipeline that automatically scales up simulated multi-layer data with synthesized visual effects. To further enhance real-world applicability, we supplement this simulated dataset with camera-captured images containing natural visual effects. Additionally, we propose a consistency loss which enforces the model to learn accurate representations for the transparent foreground layer when ground-truth annotations are not available. Our method achieves superior quality in layer decomposition, outperforming existing approaches in object removal and spatial editing tasks across several benchmarks and multiple user studies, unlocking various creative possibilities for layer-wise image editing. The project page is https://rayjryang.github.io/LayerDecomp.

Paper Structure

This paper contains 18 sections, 3 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: (a) Given an input image and a binary object mask, our model is able to decompose the image into a clean background layer and a transparent foreground layer with preserved visual effects such as shadows and reflections. (b) Subsequently, our decomposition empowers complex and controllable layer-wise editing such as spatial, color and/or style editing.
  • Figure 2: The framework of LayerDecomp. The model takes four inputs: two conditional inputs, including a composite image and an object mask, and two noisy latent representations of the background and foreground layers. During training, we use simulated image triplets alongside camera-captured background-composite image pairs. We also introduce a pixel-space consistency loss to ensure that natural visual effects such as shadows and refelctions are faithfully preserved in the transparent foreground layer.
  • Figure 3: Object removal - comparison with mask-based methods. Our model, using tight input masks, generates more visually plausible results with fewer artifacts compared to ControlNet Inpainting zhang2023adding, SD-XL Inpainting rombach2022high, and PowerPaint zhuang2023task, which all require loose mask input. Besides, our model delivers coherent foreground layers and supports more advanced downstream editing tasks.
  • Figure 4: Object removal - comparison with ObjectDrop liu2024object. Based on their released examples, our model demonstrates comparable quality in photorealistic object removal in the background layer, while decomposing the foreground with intact visual effects.
  • Figure 5: Object removal - comparison with instruction-driven methods. Combining with a text-based grounding method, our model can effectively remove target objects and preserve background integrity, while existing instruction-based editing methods, such as Emu-Edit sheynin2024emu, MGIE fu2024mgie, and OmniGen xiao2024omnigen, may struggle to fully remove the target or maintain background consistency.
  • ...and 9 more figures