Table of Contents
Fetching ...

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Pengzhi Li, QInxuan Huang, Yikang Ding, Zhiheng Li

TL;DR

LayerDiffusion tackles the challenge of text-guided image editing that requires simultaneous background replacement and subject attribute changes while preserving the subject's identity. It introduces a layered framework that decouples foreground and background editing via layered controlled optimization of text embeddings and a layered diffusion training regime, complemented by an iterative guidance strategy to tightly enforce textual constraints. The method achieves high fidelity to input subject features and coherent integration into new backgrounds, outperforming existing editing approaches on multitask scenarios. User studies corroborate the quantitative gains, highlighting LayerDiffusion's potential to enable versatile, controllable image edits with single-image inputs.

Abstract

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining consistency between the subject and the background remains challenging. In this paper, we propose LayerDiffusion, a semantic-based layered controlled image editing method. Our method enables non-rigid editing and attribute modification of specific subjects while preserving their unique characteristics and seamlessly integrating them into new backgrounds. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy combined with layered diffusion training. During the diffusion process, an iterative guidance strategy is used to generate a final image that aligns with the textual description. Experimental results demonstrate the effectiveness of our method in generating highly coherent images that closely align with the given textual description. The edited images maintain a high similarity to the features of the input image and surpass the performance of current leading image editing methods. LayerDiffusion opens up new possibilities for controllable image editing.

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

TL;DR

LayerDiffusion tackles the challenge of text-guided image editing that requires simultaneous background replacement and subject attribute changes while preserving the subject's identity. It introduces a layered framework that decouples foreground and background editing via layered controlled optimization of text embeddings and a layered diffusion training regime, complemented by an iterative guidance strategy to tightly enforce textual constraints. The method achieves high fidelity to input subject features and coherent integration into new backgrounds, outperforming existing editing approaches on multitask scenarios. User studies corroborate the quantitative gains, highlighting LayerDiffusion's potential to enable versatile, controllable image edits with single-image inputs.

Abstract

Text-guided image editing has recently experienced rapid development. However, simultaneously performing multiple editing actions on a single image, such as background replacement and specific subject attribute changes, while maintaining consistency between the subject and the background remains challenging. In this paper, we propose LayerDiffusion, a semantic-based layered controlled image editing method. Our method enables non-rigid editing and attribute modification of specific subjects while preserving their unique characteristics and seamlessly integrating them into new backgrounds. We leverage a large-scale text-to-image model and employ a layered controlled optimization strategy combined with layered diffusion training. During the diffusion process, an iterative guidance strategy is used to generate a final image that aligns with the textual description. Experimental results demonstrate the effectiveness of our method in generating highly coherent images that closely align with the given textual description. The edited images maintain a high similarity to the features of the input image and surpass the performance of current leading image editing methods. LayerDiffusion opens up new possibilities for controllable image editing.
Paper Structure (20 sections, 9 equations, 14 figures, 1 table)

This paper contains 20 sections, 9 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Our method achieves layered image editing through text descriptions, enabling simultaneous modifications of backgrounds and specific subjects, such as background replacement, object resizing, and complex non-rigid changes.
  • Figure 2: Our method utilizes a layered controlled optimization strategy to refine text embeddings and a layered diffusion strategy to fine-tune the diffusion model. During inference, an iterative guidance strategy is employed to directly generate images aligning with the multiple editing actions described in the input text.
  • Figure 3: Given a complex text description, the original image (left) is capable of performing multiple editing actions and maintaining similar characteristics of a specific subject. Note that the mask in the bottom left corner is used to change the size of the selected object.
  • Figure 4: We present several edited images and compare them with similar image editing algorithms, such as SDEdit meng2021sdedit, Imagic kawar2022imagic, and PnP tumanyan2022plug. Our method generates the best results.
  • Figure 5: We present the edited images with different settings. For each setting, we show two generated images using different random seeds. (f) illustrates the final edited results.
  • ...and 9 more figures