Table of Contents
Fetching ...

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

Haofeng Liu, Chenshu Xu, Yifei Yang, Lihua Zeng, Shengfeng He

TL;DR

DragNoise tackles the challenge of fine-grained, point-based editing in diffusion models by reframing diffusion semantics as editable signals. It identifies the bottleneck feature of the U-Net as a compact, high-level semantic space and performs semantic optimization at an early denoising timestep, followed by diffusion semantic propagation to later steps to maintain the edit. The approach yields a significant improvement in efficiency, reducing optimization iterations by over 50% and enabling faster, more faithful edits without retracing the latent map. Empirical results on DragBench and diverse images demonstrate superior control and semantic retention compared with GAN- and diffusion-based baselines, highlighting practical impact for interactive image manipulation.

Abstract

Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation

TL;DR

DragNoise tackles the challenge of fine-grained, point-based editing in diffusion models by reframing diffusion semantics as editable signals. It identifies the bottleneck feature of the U-Net as a compact, high-level semantic space and performs semantic optimization at an early denoising timestep, followed by diffusion semantic propagation to later steps to maintain the edit. The approach yields a significant improvement in efficiency, reducing optimization iterations by over 50% and enabling faster, more faithful edits without retracing the latent map. Empirical results on DragBench and diverse images demonstrate superior control and semantic retention compared with GAN- and diffusion-based baselines, highlighting practical impact for interactive image manipulation.

Abstract

Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.
Paper Structure (14 sections, 4 equations, 12 figures)

This paper contains 14 sections, 4 equations, 12 figures.

Figures (12)

  • Figure 1: Reconstructed images by DDIM inversion, where features of different levels (column) are copied to corresponding layers in all subsequent U-Nets, beginning from various denoising timesteps (row). The original images, reconstructed without feature copying, are provided for comparison. This synchronization of bottleneck features across all subsequent steps reveals that the core semantics of the diffusion process are encoded within the bottleneck layer, predominantly learned in the early phases of the denoising process.
  • Figure 2: Quantitative analysis on middle-block feature replacement. This involves replacing features at all subsequent timesteps with those from timestep 35, using features from various layers. Our evaluation metrics, MSE and LPIPS zhang2018unreasonable were used to compare the reconstructed images against the original inputs.
  • Figure 3: Comparison between DragNoise and other relevant methods in feature modification, highlighting optimized features in green.
  • Figure 4: Comparison of point-based editing methods with various drags. Our DragNoise exhibits superior semantic control ability.
  • Figure 5: Comparison with DragDiffusion across diverse images. Our method yields more stable and plausible edits aligned with user inputs.
  • ...and 7 more figures