Table of Contents
Fetching ...

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

Qian Wang, Biao Zhang, Michael Birsak, Peter Wonka

TL;DR

This paper systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise.

Abstract

Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.

MDP: A Generalized Framework for Text-Guided Image Editing by Manipulating the Diffusion Path

TL;DR

This paper systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise.

Abstract

Image generation using diffusion can be controlled in multiple ways. In this paper, we systematically analyze the equations of modern generative diffusion networks to propose a framework, called MDP, that explains the design space of suitable manipulations. We identify 5 different manipulations, including intermediate latent, conditional embedding, cross attention maps, guidance, and predicted noise. We analyze the corresponding parameters of these manipulations and the manipulation schedule. We show that some previous editing methods fit nicely into our framework. Particularly, we identified one specific configuration as a new type of control by manipulating the predicted noise, which can perform higher-quality edits than previous work for a variety of local and global edits.
Paper Structure (13 sections, 4 equations, 22 figures, 2 tables, 4 algorithms)

This paper contains 13 sections, 4 equations, 22 figures, 2 tables, 4 algorithms.

Figures (22)

  • Figure 1: Linear schedule is shown in (a), while linear, cosine, and exponential schedule are shown in purple, orange, and green, respectively, in (b). For linear schedule, we fix the linear factor as 1.0 while varying $t_{\text{max}}$ and $t_{\text{min}}$ by varying $T_M$. For the other three schedules, we fix the $t_{\text{max}} = 50$ then vary scale factors and $t_{\text{min}}$.
  • Figure 2: Results of MDP-$\mathbf{x}_t$ using constant schedule.
  • Figure 3: Results of MDP-$\mathbf{x}_t$ using linear, cosine and exponential schedule.
  • Figure 4: Results of MDP-$\mathbf{c}$ using constant schedule.
  • Figure 5: Results of MDP-$\mathbf{c}$ using linear, cosine and exponential schedule.
  • ...and 17 more figures