Table of Contents
Fetching ...

Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion

Xunpeng Huang, Yingyu Lin, Nikki Lijing Kuang, Hanze Dong, Difan Zou, Yian Ma, Tong Zhang

TL;DR

This work addresses inefficiencies in diffusion-based generative modeling by marrying discrete and continuous diffusion paradigms through Quantized Transition Diffusion (QTD). By histogram-quantizing the target distribution and embedding it into a binary, Hamming-graph space, QTD enables long-range forward transitions with sparse connectivity and employs truncated uniformization for an unbiased, efficient reverse-time sampling. The authors prove a theoretical TV-convergence guarantee with a score-evaluation complexity of $O(d \ln^2(d/\epsilon))$ under minimal score assumptions, advancing the theoretical foundations of diffusion modeling. Overall, QTD offers a principled, scalable framework that unifies discrete and continuous diffusion schemes and yields potential gains in sampling efficiency for high-dimensional data.

Abstract

Continuous diffusion models have demonstrated remarkable performance in data generation across various domains, yet their efficiency remains constrained by two critical limitations: (1) the local adjacency structure of the forward Markov process, which restricts long-range transitions in the data space, and (2) inherent biases introduced during the simulation of time-inhomogeneous reverse denoising processes. To address these challenges, we propose Quantized Transition Diffusion (QTD), a novel approach that integrates data quantization with discrete diffusion dynamics. Our method first transforms the continuous data distribution $p_*$ into a discrete one $q_*$ via histogram approximation and binary encoding, enabling efficient representation in a structured discrete latent space. We then design a continuous-time Markov chain (CTMC) with Hamming distance-based transitions as the forward process, which inherently supports long-range movements in the original data space. For reverse-time sampling, we introduce a \textit{truncated uniformization} technique to simulate the reverse CTMC, which can provably provide unbiased generation from $q_*$ under minimal score assumptions. Through a novel KL dynamic analysis of the reverse CTMC, we prove that QTD can generate samples with $O(d\ln^2(d/ε))$ score evaluations in expectation to approximate the $d$--dimensional target distribution $p_*$ within an $ε$ error tolerance. Our method not only establishes state-of-the-art inference efficiency but also advances the theoretical foundations of diffusion-based generative modeling by unifying discrete and continuous diffusion paradigms.

Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion

TL;DR

This work addresses inefficiencies in diffusion-based generative modeling by marrying discrete and continuous diffusion paradigms through Quantized Transition Diffusion (QTD). By histogram-quantizing the target distribution and embedding it into a binary, Hamming-graph space, QTD enables long-range forward transitions with sparse connectivity and employs truncated uniformization for an unbiased, efficient reverse-time sampling. The authors prove a theoretical TV-convergence guarantee with a score-evaluation complexity of under minimal score assumptions, advancing the theoretical foundations of diffusion modeling. Overall, QTD offers a principled, scalable framework that unifies discrete and continuous diffusion schemes and yields potential gains in sampling efficiency for high-dimensional data.

Abstract

Continuous diffusion models have demonstrated remarkable performance in data generation across various domains, yet their efficiency remains constrained by two critical limitations: (1) the local adjacency structure of the forward Markov process, which restricts long-range transitions in the data space, and (2) inherent biases introduced during the simulation of time-inhomogeneous reverse denoising processes. To address these challenges, we propose Quantized Transition Diffusion (QTD), a novel approach that integrates data quantization with discrete diffusion dynamics. Our method first transforms the continuous data distribution into a discrete one via histogram approximation and binary encoding, enabling efficient representation in a structured discrete latent space. We then design a continuous-time Markov chain (CTMC) with Hamming distance-based transitions as the forward process, which inherently supports long-range movements in the original data space. For reverse-time sampling, we introduce a \textit{truncated uniformization} technique to simulate the reverse CTMC, which can provably provide unbiased generation from under minimal score assumptions. Through a novel KL dynamic analysis of the reverse CTMC, we prove that QTD can generate samples with score evaluations in expectation to approximate the --dimensional target distribution within an error tolerance. Our method not only establishes state-of-the-art inference efficiency but also advances the theoretical foundations of diffusion-based generative modeling by unifying discrete and continuous diffusion paradigms.

Paper Structure

This paper contains 24 sections, 17 theorems, 193 equations, 3 figures, 3 tables, 2 algorithms.

Key Result

Lemma 3.1

Suppose the target distribution $p_*\propto \exp(-f_*)$ is $\sigma$ sub-Gaussian and $f_*$ is $H$--smooth, we can construct $\overline{p}_*$ defined on a finite cube $\mathrm{Cube}\left(L\right)$ with length to satisfy $\mathrm{TV}\left(p_*, \overline{p}_*\right)\le 3\epsilon$.

Figures (3)

  • Figure 1: Visualization of different adjacency structures. The bold blue edges highlight a diameter path---a shortest path between the two most distant vertices in each graph. Drawing samples in a discrete space ${\mathcal{Y}}$ by simulating a CTMC can be viewed as traversing a graph whose diameter governs the number of iterations required for convergence, while the out-degree of each node influences the per-iteration complexity. In the neighborhood adjacency $G_{\text{Tridiagonal}}$, each node has an out-degree of $O(1)$ but a diameter of $O(|{\mathcal{Y}}|)$. For the dense adjacency, the graph $G_{\text{Dense}}$ attains a diameter of $O(1)$ at the cost of an $O(|{\mathcal{Y}}|)$ out-degree. Notably, the binary adjacency $G_{\text{Hypercube}}$ offers a balanced design, featuring both a diameter and an out-degree of $O(\log|{\mathcal{Y}}|)$.
  • Figure 2: Visualization of the histogram approximation. The first step regularizes the original distribution in some bounded sets but controls the TV gap by Lemma \ref{['lem:histog_dis_comp']}. The second step quantizes the probability density to a histogram-like distribution but controls the TV gap by Lemma \ref{['lem:histog_dis_quan']}.
  • Figure 3: Heatmaps of the probability transition at different time steps $t$ for four diffusion processes: a continuous normalized Gaussian kernel on $[0,1]$ (top row), and discrete CTMCs over $|{\mathcal{Y}}| = 8$ states based on tridiagonal, dense, and hypercube transition rate matrices (bottom three rows).

Theorems & Definitions (33)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 4.1
  • Lemma B.1: Theorem 4.10 of boucheron2003concentration
  • Lemma B.2: Chain rule of TV
  • Lemma B.3: Backward Kolmogorov equation
  • Lemma B.4: Lemma 11 in vempala2019rapid
  • Lemma C.1: Forward transition kernel
  • Remark C.2
  • ...and 23 more