Differential Diffusion: Giving Each Pixel Its Strength

Eran Levin; Ohad Fried

Differential Diffusion: Giving Each Pixel Its Strength

Eran Levin, Ohad Fried

TL;DR

This work introduces Differential Diffusion, an inference-time framework that enables per-pixel control over edit strength in diffusion-based image editing through a change map. By decomposing the map into nested masks and injecting region-specific content at varying timesteps, it achieves fine-grained, text-guided edits without any model fine-tuning. The approach supports soft-inpainting, introduces the Strength Fan visualization, and provides new metrics (CAM/DAM, LPIPS-based edit strength) to quantify adherence and quality. It demonstrates compatibility with multiple diffusion models, extends to various architectures (SDXL, Kandinsky, DeepFloyd IF), and includes automatic change-map generation and a comprehensive user study, highlighting significant practical impact for precise, region-specific image editing. Overall, it broadens the scope of diffusion-based editing to nuanced, location-dependent transformations with minimal overhead and broad applicability.

Abstract

Diffusion models have revolutionized image generation and editing, producing state-of-the-art results in conditioned and unconditioned image synthesis. While current techniques enable user control over the degree of change in an image edit, the controllability is limited to global changes over an entire edited region. This paper introduces a novel framework that enables customization of the amount of change per pixel or per image region. Our framework can be integrated into any existing diffusion model, enhancing it with this capability. Such granular control on the quantity of change opens up a diverse array of new editing capabilities, such as control of the extent to which individual objects are modified, or the ability to introduce gradual spatial changes. Furthermore, we showcase the framework's effectiveness in soft-inpainting -- the completion of portions of an image while subtly adjusting the surrounding areas to ensure seamless integration. Additionally, we introduce a new tool for exploring the effects of different change quantities. Our framework operates solely during inference, requiring no model training or fine-tuning. We demonstrate our method with the current open state-of-the-art models, and validate it via both quantitative and qualitative comparisons, and a user study. Our code is available at: https://github.com/exx8/differential-diffusion

Differential Diffusion: Giving Each Pixel Its Strength

TL;DR

Abstract

Paper Structure (40 sections, 56 figures, 4 tables, 2 algorithms)

This paper contains 40 sections, 56 figures, 4 tables, 2 algorithms.

Introduction
Contributions
Related Work
Text-based Image Synthesis
Text-based Editing
Mask-based Editing
Diffusion Models with Native Support
Other Approaches
Method
Preliminaries
Observations
Algorithm
Optimization: Skipping
Technical Details
Extension For Different Diffusion Models
...and 25 more sections

Figures (56)

Figure 1: Breakdown of \ref{['algo']}\ref{['injection']} over time. Top: $z_t' \odot (1 - mask)$, the regions copied from a noised version of the input. Bottom: $z_{t+1} \odot mask$, the residue regions copied from the previous U-Net output. Observe how the change map determines the inference process---the darker the region, the earlier it is copied from the residue.
Figure 2: Ablation of nested masks. Our result is more complex, blends better with the scene, and less blurry. Note the difference in transitions (1st row: the sharp transition in the wall) and placements (2nd row: the building is inside the lake). The seed is fixed for each row. Prompts: “a fine art painting”, “a city skyline…”.
Figure 3: illustration to the inference process. Top: $z_t'$ - the original image noised to the current timestep. bottom: the intermediate images that the diffusion model denoises. The masks near the arrow represent the regions that were copied from each picture. Follow the arrows to discern the influence of the origins on the output image, and observe the correlation with the decomposed masks and the change map. The prompt is "Gothic painting".
Figure 4: Our method with different diffusion models. We applied our framework to several diffusion models: SDXL podell2023sdxl, DeepFloyd IF DeepFloydIF, and Kandinsky razzhigaev2023kandinsky, demonstrating its generality. Prompts: “cow”, “feathers”, “sheepskin”.
Figure 5: Soft-inpainting. We compare our approach to no softening, -compositing, Poission-based 10.1145/882262.882269 and Laplace-based 10.1145/245.247 compositing, and standard soft-inpainting (as implemented in Stable Diffusion web UI AUTOMATIC1111_Stable_Diffusion_Web_2022). For -compositing, Poisson-based and Laplace-based methods, we blend the original image with a regular inpaint result using a Gaussian blurred version of the inpaint mask. In all other methods, artifacts appear in the transition area, and the unchanged region looks pasted. For standard softening, even the inner parts of the figures are corrupted. Our method produces a more natural blend. Prompt: "Impressionist".
...and 51 more figures

Differential Diffusion: Giving Each Pixel Its Strength

TL;DR

Abstract

Differential Diffusion: Giving Each Pixel Its Strength

Authors

TL;DR

Abstract

Table of Contents

Figures (56)