Table of Contents
Fetching ...

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

Fangyikang Wang, Hubery Yin, Yuejiang Dong, Huminhao Zhu, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li

TL;DR

Comprehensive experiments demonstrate the O-BELM sampler establishes the exact inversion property while achieving high-quality sampling, and conducts additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler.

Abstract

The inversion of diffusion model sampling, which aims to find the corresponding initial noise of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion samplers have been proposed to address the inexact inversion issue in a training-free manner. However, the theoretical properties of these heuristic samplers remain unknown and they often exhibit mediocre sampling quality. In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases. The BELM formulation is derived from the variable-stepsize-variable-formula linear multi-step method via integrating a bidirectional explicit constraint. We highlight this bidirectional explicit constraint is the key of mathematically exact inversion. We systematically investigate the Local Truncation Error (LTE) within the BELM framework and show that the existing heuristic designs of exact inversion samplers yield sub-optimal LTE. Consequently, we propose the Optimal BELM (O-BELM) sampler through the LTE minimization approach. We conduct additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler. Comprehensive experiments demonstrate our O-BELM sampler establishes the exact inversion property while achieving high-quality sampling. Additional experiments in image editing and image interpolation highlight the extensive potential of applying O-BELM in varying applications.

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

TL;DR

Comprehensive experiments demonstrate the O-BELM sampler establishes the exact inversion property while achieving high-quality sampling, and conducts additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler.

Abstract

The inversion of diffusion model sampling, which aims to find the corresponding initial noise of a sample, plays a critical role in various tasks. Recently, several heuristic exact inversion samplers have been proposed to address the inexact inversion issue in a training-free manner. However, the theoretical properties of these heuristic samplers remain unknown and they often exhibit mediocre sampling quality. In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases. The BELM formulation is derived from the variable-stepsize-variable-formula linear multi-step method via integrating a bidirectional explicit constraint. We highlight this bidirectional explicit constraint is the key of mathematically exact inversion. We systematically investigate the Local Truncation Error (LTE) within the BELM framework and show that the existing heuristic designs of exact inversion samplers yield sub-optimal LTE. Consequently, we propose the Optimal BELM (O-BELM) sampler through the LTE minimization approach. We conduct additional analysis to substantiate the theoretical stability and global convergence property of the proposed optimal sampler. Comprehensive experiments demonstrate our O-BELM sampler establishes the exact inversion property while achieving high-quality sampling. Additional experiments in image editing and image interpolation highlight the extensive potential of applying O-BELM in varying applications.

Paper Structure

This paper contains 70 sections, 19 theorems, 64 equations, 10 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

Under Assumption asump:epsilon, there exists a unique solution to the diffusion IVP ivp-diffusion.

Figures (10)

  • Figure 1: Schematic description of DDIM (left) and BELM (right). DDIM uses $\mathbf{x}_i$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)$ to calculate $\mathbf{x}_{i-1}$ based on a linear relation between $\mathbf{x}_i$, $\mathbf{x}_{i-1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_i,i)$ (represented by the blue line). However, DDIM inversion uses $\mathbf{x}_{i-1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)$ to calculate $\mathbf{x}_{i}$ based on a different linear relation represented by the red line. This mismatch leads to the inexact inversion of DDIM. In contrast, BELM seeks to establish a linear relation between $\mathbf{x}_{i-1}$, $\mathbf{x}_i$, $\mathbf{x}_{i+1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i}, i)$ (represented by the green line). BELM and its inversion are derived from this unitary relation, which facilitates the exact inversion. Specifically, BELM uses the linear combination of $\mathbf{x}_i$, $\mathbf{x}_{i+1}$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)$ to calculate $\mathbf{x}_{i-1}$, and the BELM inversion uses the linear combination of $\mathbf{x}_{i-1}$, $\mathbf{x}_i$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i},i)$ to calculate $\mathbf{x}_{i+1}$. The bidirectional explicit constraint means this linear relation does not include the derivatives at the bidirectional endpoint, that is, $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i-1},i-1)$ and $\boldsymbol{\varepsilon}_\theta(\mathbf{x}_{i+1},i+1)$.
  • Figure 2: Examples of editing results using O-BELM on both synthesized and real images. We showcase the diverse editing capabilities of O-BELM across a range of tasks, including human face modifications, content change, entity addition and global style transfer. The exact inversion property of O-BELM enables large-scale image alterations while preserving auxiliary details (background in first row, hairstyle in second row, traffic sign in third row, tree and crop in fourth row, composition in last row). Its stability and accuracy further ensure the high quality of the resulting images.
  • Figure 3: Comparison of editing results from different samplers under 50 steps. DDIM leads to inconsistencies (highlighted by the red rectangle), and the EDICT and BDIA samplers may introduce unrealistically low-quality sections (highlighted by the yellow rectangle). Our O-BELM sampler ensures consistency and demonstrates high-quality results.
  • Figure 4: Results of image reconstruction and MSE error using DDIM and exact inversion samplers under 50 steps. The red rectangle point out the inconsistent part in the reconstructed images of DDIM.
  • Figure 5: (a) uncurated CIFAR10 samples with BELM, steps = 100 (b) uncurated CelebA-HQ samples with BELM, steps = 100
  • ...and 5 more figures

Theorems & Definitions (40)

  • Proposition 1
  • Proposition 2
  • Remark 1
  • Definition 1
  • Proposition 3
  • Corollary 1
  • Proposition 4
  • Definition 2
  • Definition 3
  • Proposition 5
  • ...and 30 more