LayeringDiff: Layered Image Synthesis via Generation, then Disassembly with Generative Knowledge
Kyoungkook Kang, Gyujin Sim, Geonung Kim, Donguk Kim, Seungho Nam, Sunghyun Cho
TL;DR
LayeringDiff reframes layered image synthesis as a decomposition problem: it first generates a composite image with a pretrained generator and then recovers foreground and background layers using a diffusion-based Foreground-Background Diffusion Decomposition (FBDD) module and a High-Frequency Alignment (HFA) module. This approach avoids large-scale, fine-tuned training for layer-specific content and leverages robust generative priors to achieve diverse, well-proportioned layers, refined textures, and seamless composition. Extensive experiments, including a user study, show improved foreground/background quality, natural blending, and broad applicability to multi-layer synthesis and real-world image decomposition. The method demonstrates practical benefits in terms of diversity, realism, and flexibility, while acknowledging limitations in alpha accuracy and shadow handling that are discussed further in the supplementary material.
Abstract
Layers have become indispensable tools for professional artists, allowing them to build a hierarchical structure that enables independent control over individual visual elements. In this paper, we propose LayeringDiff, a novel pipeline for the synthesis of layered images, which begins by generating a composite image using an off-the-shelf image generative model, followed by disassembling the image into its constituent foreground and background layers. By extracting layers from a composite image, rather than generating them from scratch, LayeringDiff bypasses the need for large-scale training to develop generative capabilities for individual layers. Furthermore, by utilizing a pretrained off-the-shelf generative model, our method can produce diverse contents and object scales in synthesized layers. For effective layer decomposition, we adapt a large-scale pretrained generative prior to estimate foreground and background layers. We also propose high-frequency alignment modules to refine the fine-details of the estimated layers. Our comprehensive experiments demonstrate that our approach effectively synthesizes layered images and supports various practical applications.
