AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

Zhiyuan Ma; Guoli Jia; Bowen Zhou

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

Zhiyuan Ma, Guoli Jia, Bowen Zhou

TL;DR

A spatio-temporal guided adaptive editing algorithm AdapEdit is proposed, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives.

Abstract

With the great success of text-conditioned diffusion models in creative text-to-image generation, various text-driven image editing approaches have attracted the attentions of many researchers. However, previous works mainly focus on discreteness-sensitive instructions such as adding, removing or replacing specific objects, background elements or global styles (i.e., hard editing), while generally ignoring subject-binding but semantically fine-changing continuity-sensitive instructions such as actions, poses or adjectives, and so on (i.e., soft editing), which hampers generative AI from generating user-customized visual contents. To mitigate this predicament, we propose a spatio-temporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing by introducing a soft-attention strategy to dynamically vary the guiding degree from the editing conditions to visual pixels from both temporal and spatial perspectives. Note our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning, extra data, or optimization. We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

TL;DR

Abstract

Paper Structure (18 sections, 9 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 9 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Related Work
Methodology
Preliminaries
Diffusion models.
Text-guided diffusion models.
AdapEdit Algorithm Framework
Flexible Word-Level Temporal Adjustment
Dynamic Pixel-Level Spatial Weighting
Experiments
Experimental Setup
Implementation Details
Baselines
Qualitative Evaluation
Quantitative Evaluation
...and 3 more sections

Figures (7)

Figure 1: Example of image editing to show discreteness-sensitive image manipulations (i.e., hard editing) and continuity-sensitive image manipulations (i.e., soft editing).
Figure 2: The performance of AdapEdit with soft editing instructions. The leftmost images are directly generated by the original condition, images on other lines are edited by the original and editing conditions.
Figure 3: The framework overview of the proposed AdapEdit algorithm.
Figure 4: The illustration of our proposed soft attention strategy, in which (a) shows the cross-attention maps from $\bm{c}$ and $\bm{c}^*$ and (b) details the specific calculating process.
Figure 5: The qualitative comparisons of with the previous SOTA methods. The generated image (the leftmost column) denotes the original image $\textbf{x}$ conditioned on $\bm{c}$ generated by SD-v1.4, other columns present the editing results conditioned on $\bm{c}^*$. Note the fixed seed denotes generating a new image conditioned on $\bm{c}^*$ by directively using SD-v1.4 with the same random seed.
...and 2 more figures

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

TL;DR

Abstract

AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (7)