Table of Contents
Fetching ...

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

TL;DR

A general RTK framework is developed that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets, which gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference.

Abstract

To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve $ε$ target error within $\tilde{\mathcal O}(d^{1/2}ε^{-1})$ under mild conditions, and RTK-MALA enjoys a $\mathcal{O}(d^{2}\log(d/ε))$ convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

TL;DR

A general RTK framework is developed that enables a more balanced subproblem decomposition, resulting in subproblems, each with strongly log-concave targets, which gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference.

Abstract

To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve target error within under mild conditions, and RTK-MALA enjoys a convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.
Paper Structure (29 sections, 47 theorems, 429 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 29 sections, 47 theorems, 429 equations, 2 figures, 1 table, 3 algorithms.

Key Result

Lemma 3.1

Suppose a Markov process $\{{\mathbf{x}}_t\}$ with SDE. sde:ou, then for any $t^\prime > t$, we have

Figures (2)

  • Figure 1: (a) Mariginal accuracy of the sampled MoG by different algorithms along NFE. (b-f) The histograms along a certain direction of sampled MoG by different algorithms. The plots labeled by 'ULA', 'ULD', 'MALA', 'MALA_ES' correspond to RTK-ULA, RTK-ULD, RTK-MALA, score-only RTK-MALA, respectively. The histogram is oriented along the second dimension when the first dimension is constrained within (0.75, 1.25).
  • Figure 2: (a-e) Clusters sampled by DDPM, RTK-ULA, RTK-ULD, score-only RTK-MALA, and RTK-MALA, respectively. (f) Clusters sampled by the ground truth distribution. These $2D$ clusters represent the projection of the original $10D$ data onto the first two dimensions.

Theorems & Definitions (84)

  • Lemma 3.1
  • Lemma 3.2
  • Theorem 4.1: Informal version of Theorem \ref{['thm:nn_estimate_complexity_gene']}
  • Corollary 4.2: Informal version of Corollary \ref{['cor:complexity_19']}
  • Corollary 4.3
  • Theorem 4.4
  • proof : Proof of Lemma \ref{['lem:rev_trans_ker_form']}
  • Lemma B.1: Chain rule of TV
  • proof
  • Lemma B.2
  • ...and 74 more