Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

Xunpeng Huang; Difan Zou; Hanze Dong; Yi Zhang; Yi-An Ma; Tong Zhang

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

TL;DR

A general RTK framework is developed that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets, which gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference.

Abstract

To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve $ε$ target error within $\tilde{\mathcal O}(d^{1/2}ε^{-1})$ under mild conditions, and RTK-MALA enjoys a $\mathcal{O}(d^{2}\log(d/ε))$ convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

TL;DR

A general RTK framework is developed that enables a more balanced subproblem decomposition, resulting in

subproblems, each with strongly log-concave targets, which gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference.

Abstract

subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve

target error within

under mild conditions, and RTK-MALA enjoys a

convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.

Paper Structure (29 sections, 47 theorems, 429 equations, 2 figures, 1 table, 3 algorithms)

This paper contains 29 sections, 47 theorems, 429 equations, 2 figures, 1 table, 3 algorithms.

Introduction
Preliminaries
General Framework of Reverse Transition Kernel
Implementation of RTK inner loops
Conclusion and Limitation
Numerical Experiments
Inference process with reverse transition kernel framework
For Term 1.
For Term 2.
Implement RTK inference with MALA
Standard implementation of Alg. \ref{['alg:inner_mala']}.
Projected implementation of Alg. \ref{['alg:inner_mala']}.
Ideally projected implementation of Alg. \ref{['alg:inner_mala']}.
Control the error from the projected transition kernel
Particles stay inside ${\mathcal{B}}({\bm{0}}, R)$.
...and 14 more sections

Key Result

Lemma 3.1

Suppose a Markov process $\{{\mathbf{x}}_t\}$ with SDE. sde:ou, then for any $t^\prime > t$, we have

Figures (2)

Figure 1: (a) Mariginal accuracy of the sampled MoG by different algorithms along NFE. (b-f) The histograms along a certain direction of sampled MoG by different algorithms. The plots labeled by 'ULA', 'ULD', 'MALA', 'MALA_ES' correspond to RTK-ULA, RTK-ULD, RTK-MALA, score-only RTK-MALA, respectively. The histogram is oriented along the second dimension when the first dimension is constrained within (0.75, 1.25).
Figure 2: (a-e) Clusters sampled by DDPM, RTK-ULA, RTK-ULD, score-only RTK-MALA, and RTK-MALA, respectively. (f) Clusters sampled by the ground truth distribution. These $2D$ clusters represent the projection of the original $10D$ data onto the first two dimensions.

Theorems & Definitions (84)

Lemma 3.1
Lemma 3.2
Theorem 4.1: Informal version of Theorem \ref{['thm:nn_estimate_complexity_gene']}
Corollary 4.2: Informal version of Corollary \ref{['cor:complexity_19']}
Corollary 4.3
Theorem 4.4
proof : Proof of Lemma \ref{['lem:rev_trans_ker_form']}
Lemma B.1: Chain rule of TV
proof
Lemma B.2
...and 74 more

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

TL;DR

Abstract

Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (84)