Table of Contents
Fetching ...

Streamlining Image Editing with Layered Diffusion Brushes

Peyman Gholami, Robert Xiao

TL;DR

Layered Diffusion Brushes (LDB) address the need for real-time, localized diffusion-based image editing by introducing a training-free, layer-based editing framework. It leverages latent caching to store a Regeneration Latent $Z_r$ and a Blending Latent $Z_b$, enabling independent, non-destructive edits with minimal recomputation. A formal layer formulation ${\mathcal L}^{(k)}$ and an overlapping-region strategy allow sequential edits to accumulate while preserving background content, achieving per-edit latencies around $140\,\text{ms}$ on consumer GPUs. Quantitative benchmarks and a user study show LDB outperforms baselines in speed and often matches or surpasses in image quality and edit fidelity, with strong usability and creativity support. The approach extends to video editing and broader diffusion-model applications, indicating wide practical impact for professional creative workflows.

Abstract

Denoising diffusion models have emerged as powerful tools for image manipulation, yet interactive, localized editing workflows remain underdeveloped. We introduce Layered Diffusion Brushes (LDB), a novel training-free framework that enables interactive, layer-based editing using standard diffusion models. LDB defines each "layer" as a self-contained set of parameters guiding the generative process, enabling independent, non-destructive, and fine-grained prompt-guided edits, even in overlapping regions. LDB leverages a unique intermediate latent caching approach to reduce each edit to only a few denoising steps, achieving 140~ms per edit on consumer GPUs. An editor implementing LDB, incorporating familiar layer concepts, was evaluated via user study and quantitative metrics. Results demonstrate LDB's superior speed alongside comparable or improved image quality, background preservation, and edit fidelity relative to state-of-the-art methods across various sequential image manipulation tasks. The findings highlight LDB's ability to significantly enhance creative workflows by providing an intuitive and efficient approach to diffusion-based image editing and its potential for expansion into related subdomains, such as video editing.

Streamlining Image Editing with Layered Diffusion Brushes

TL;DR

Layered Diffusion Brushes (LDB) address the need for real-time, localized diffusion-based image editing by introducing a training-free, layer-based editing framework. It leverages latent caching to store a Regeneration Latent and a Blending Latent , enabling independent, non-destructive edits with minimal recomputation. A formal layer formulation and an overlapping-region strategy allow sequential edits to accumulate while preserving background content, achieving per-edit latencies around on consumer GPUs. Quantitative benchmarks and a user study show LDB outperforms baselines in speed and often matches or surpasses in image quality and edit fidelity, with strong usability and creativity support. The approach extends to video editing and broader diffusion-model applications, indicating wide practical impact for professional creative workflows.

Abstract

Denoising diffusion models have emerged as powerful tools for image manipulation, yet interactive, localized editing workflows remain underdeveloped. We introduce Layered Diffusion Brushes (LDB), a novel training-free framework that enables interactive, layer-based editing using standard diffusion models. LDB defines each "layer" as a self-contained set of parameters guiding the generative process, enabling independent, non-destructive, and fine-grained prompt-guided edits, even in overlapping regions. LDB leverages a unique intermediate latent caching approach to reduce each edit to only a few denoising steps, achieving 140~ms per edit on consumer GPUs. An editor implementing LDB, incorporating familiar layer concepts, was evaluated via user study and quantitative metrics. Results demonstrate LDB's superior speed alongside comparable or improved image quality, background preservation, and edit fidelity relative to state-of-the-art methods across various sequential image manipulation tasks. The findings highlight LDB's ability to significantly enhance creative workflows by providing an intuitive and efficient approach to diffusion-based image editing and its potential for expansion into related subdomains, such as video editing.
Paper Structure (35 sections, 2 equations, 55 figures, 1 table, 1 algorithm)

This paper contains 35 sections, 2 equations, 55 figures, 1 table, 1 algorithm.

Figures (55)

  • Figure 1: Hierarchical image editing with Layered Diffusion Brushes: LDB is capable of creating and stacking a wide range of independent edits, including object addition, removal, or replacement, colour and style changes/combining, and object attribute modification. Each edit is performed independently, and users are able to switch between the edits seamlessly.
  • Figure 2: Overview of the Proposed Method: The top box shows standard DM-based image generation from noisy latent $Z_0$ and prompt $\mathcal{P}$. The middle section depicts the latent caching module, storing and retrieving intermediate latents for different layers. The bottom box illustrates the editing process: a new noise sample $S'$ merges with the original latent at step $r$ using mask $m$ and strength control $\alpha$. Diffusion continues until step $b$, where modified and cached latents blend to generate the final edited image.
  • Figure 3: Overlapping edit regions in LDB: overlapping edits enable complex, interacting modifications. For example, one layer can adjust color while another changes shape, with the final result combining both.
  • Figure 4: Box and Custom Mask Options: In box mode, users click the target region's center to generate edits within the specified area and can drag the box to explore variations instantly. In custom mask mode, users draw a mask over the desired region and adjust the seed using the mouse wheel or scrolling gestures to generate new variations.
  • Figure 6: Ablation study on regeneration latent step $r$ (increasing left to right). Small $r$ results in strong prompt adherence ("cat") but introduces artifacts. Large $r$ (near $N$) leads to insufficient modification, retaining the original "dog". An intermediate $r$ achieves the best balance of edit fidelity and background preservation.
  • ...and 50 more figures