Table of Contents
Fetching ...

$\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models

Yong-Hyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Yuki Mitsufuji

TL;DR

Jump Your Steps (JYS) addresses slow sampling in discrete diffusion models by optimizing the discrete sampling schedule to minimize Compounding Decoding Error (CDE) without extra computation. The authors derive a tractable KLUB bound via Girsanov’s theorem for CTMCs and propose a hierarchical, KLUB-guided procedure to search for optimal timesteps, applicable to τ-leaping and $k$-Gillespie samplers. Across synthetic data, CIFAR-10, music, and text, JYS consistently improves sample quality at various NFEs and noise schedules, illustrating its versatility as a general framework for fast, high-quality discrete diffusion sampling. The work also provides two practical techniques to estimate KLUB efficiently and analyzes how JYS adapts to data-dependent token dependencies. Overall, JYS offers a budget-friendly, architecture-agnostic approach to accelerate discrete diffusion generation with quantified error control.

Abstract

Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. Despite recent advances, DDMs face the challenge of slow sampling speeds. While parallel sampling methods like $τ$-leaping accelerate this process, they introduce $\textit{Compounding Decoding Error}$ (CDE), where discrepancies arise between the true distribution and the approximation from parallel token generation, leading to degraded sample quality. In this work, we present $\textit{Jump Your Steps}$ (JYS), a novel approach that optimizes the allocation of discrete sampling timesteps by minimizing CDE without extra computational cost. More precisely, we derive a practical upper bound on CDE and propose an efficient algorithm for searching for the optimal sampling schedule. Extensive experiments across image, music, and text generation show that JYS significantly improves sampling quality, establishing it as a versatile framework for enhancing DDM performance for fast sampling.

$\textit{Jump Your Steps}$: Optimizing Sampling Schedule of Discrete Diffusion Models

TL;DR

Jump Your Steps (JYS) addresses slow sampling in discrete diffusion models by optimizing the discrete sampling schedule to minimize Compounding Decoding Error (CDE) without extra computation. The authors derive a tractable KLUB bound via Girsanov’s theorem for CTMCs and propose a hierarchical, KLUB-guided procedure to search for optimal timesteps, applicable to τ-leaping and -Gillespie samplers. Across synthetic data, CIFAR-10, music, and text, JYS consistently improves sample quality at various NFEs and noise schedules, illustrating its versatility as a general framework for fast, high-quality discrete diffusion sampling. The work also provides two practical techniques to estimate KLUB efficiently and analyzes how JYS adapts to data-dependent token dependencies. Overall, JYS offers a budget-friendly, architecture-agnostic approach to accelerate discrete diffusion generation with quantified error control.

Abstract

Diffusion models have seen notable success in continuous domains, leading to the development of discrete diffusion models (DDMs) for discrete variables. Despite recent advances, DDMs face the challenge of slow sampling speeds. While parallel sampling methods like -leaping accelerate this process, they introduce (CDE), where discrepancies arise between the true distribution and the approximation from parallel token generation, leading to degraded sample quality. In this work, we present (JYS), a novel approach that optimizes the allocation of discrete sampling timesteps by minimizing CDE without extra computational cost. More precisely, we derive a practical upper bound on CDE and propose an efficient algorithm for searching for the optimal sampling schedule. Extensive experiments across image, music, and text generation show that JYS significantly improves sampling quality, establishing it as a versatile framework for enhancing DDM performance for fast sampling.

Paper Structure

This paper contains 47 sections, 5 theorems, 42 equations, 15 figures, 2 algorithms.

Key Result

Theorem 3.1

We have the following bound on the KL divergence between $\mathbb{P}_0$ and $\mathbb{Q}_0^{T \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } t_1 \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } \cdots \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } 0}$ in terms of cumulative CDEs:

Figures (15)

  • Figure 1: (Top) Comparison of sampling trajectories: ground truth vs. parallel sampling using a uniform schedule and the Jump Your Steps (JYS) schedule. (Bottom)Uniform schedule exhibits compounding decoding errors during parallel sampling, while JYS reduces them by using fewer steps in deterministic phases and reallocating skipped steps to other timesteps.
  • Figure 2: An illustration of the relationship between the KL divergence of the distribution, the compounding error $\mathcal{E}_{\mathrm{CDE}}$ (Section \ref{['subsection:section3.1']}), and KLUB (Section \ref{['subsection:section3.2']}). The sampling schedule $\{ T \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } t_1 \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } \dots \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } t_{N-1} \mathrel{ \mkern2mu \clipbox{{.5} 0 0 0}{$$} } 0 \}$ is optimized to minimize KLUB using the efficient algorithms detailed in Section \ref{['subsection:section3.3']}, and \ref{['subsection:section3.4']}.
  • Figure 3: We optimize the sampling schedule by refining it from coarse intervals to finer intervals, using a hierarchical breakdown strategy.
  • Figure 4: The values of KLUB with respect to $t$. Blue lines show estimated results from individual $(X_t)_{t\in[T,0]}$, while the red line is the average.
  • Figure 5: Performance comparisons on Countdown. The JYS schedule enhances sampling quality across different types of samplers.
  • ...and 10 more figures

Theorems & Definitions (6)

  • Theorem 3.1
  • Theorem 3.2
  • Proposition A.1
  • Proposition A.2
  • Theorem A.3
  • proof