Table of Contents
Fetching ...

Accelerated Diffusion Models via Speculative Sampling

Valentin De Bortoli, Alexandre Galashov, Arthur Gretton, Arnaud Doucet

TL;DR

The paper extends speculative sampling from discrete token generation to continuous diffusion processes, introducing a training-free draft strategy and a reflection maximal coupling-based adjusted rejection step to preserve exact diffusion samples. It provides a theoretical analysis of complexity and acceptance, and demonstrates substantial speedups (often halving function evaluations) on CIFAR-10, LSUN, and a robotic PushT task without sacrificing sample quality. The approach supports independent or frozen-draft models, includes Langevin-diffusion adaptations, and offers practical avenues for combining with parallel sampling and distillation techniques. Overall, the work advances efficient diffusion-model inference with rigorous optimality properties and broad potential applications.

Abstract

Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations, while generating exact samples from the target model.

Accelerated Diffusion Models via Speculative Sampling

TL;DR

The paper extends speculative sampling from discrete token generation to continuous diffusion processes, introducing a training-free draft strategy and a reflection maximal coupling-based adjusted rejection step to preserve exact diffusion samples. It provides a theoretical analysis of complexity and acceptance, and demonstrates substantial speedups (often halving function evaluations) on CIFAR-10, LSUN, and a robotic PushT task without sacrificing sample quality. The approach supports independent or frozen-draft models, includes Langevin-diffusion adaptations, and offers practical avenues for combining with parallel sampling and distillation techniques. Overall, the work advances efficient diffusion-model inference with rigorous optimality properties and broad potential applications.

Abstract

Speculative sampling is a popular technique for accelerating inference in Large Language Models by generating candidate tokens using a fast draft model and accepting or rejecting them based on the target model's distribution. While speculative sampling was previously limited to discrete sequences, we extend it to diffusion models, which generate samples via continuous, vector-valued Markov chains. In this context, the target model is a high-quality but computationally expensive diffusion model. We propose various drafting strategies, including a simple and effective approach that does not require training a draft model and is applicable out of the box to any diffusion model. Our experiments demonstrate significant generation speedup on various diffusion models, halving the number of function evaluations, while generating exact samples from the target model.
Paper Structure (57 sections, 16 theorems, 150 equations, 6 figures, 9 tables, 11 algorithms)

This paper contains 57 sections, 16 theorems, 150 equations, 6 figures, 9 tables, 11 algorithms.

Key Result

proposition 1

validity Let $\tilde{X}\sim p$ then Algorithm alg:MaxCoupling outputs $X \sim q$. This procedure is optimal in the sense that it maximizes the probability that $X=\tilde{X}$ under the constraints $\tilde{X}\sim p,~X\sim q$. Additionally, we have where $||p-q||_{\textup{TV}}:=\tfrac{1}{2}\sum_{x\in \mathcal{X}}|p(x)-q(x)|$.

Figures (6)

  • Figure 1: Speculative Sampling for diffusion models. Draft states are efficiently generated and verified in parallel. Upon the first rejection, a new state is sampled using an adjusted distribution combining draft & target models, and the remainder of the draft sequence is discarded.
  • Figure 2: Two maximal couplings between $p=\mathcal{N}(0.5, 0.25)$ and $q=\mathcal{N}(1.5, 0.25)$: the one given by \ref{['alg:MaxCoupling']} (top) and the reflection maximal coupling from \ref{['alg:ReflectionMaxCoupling']} (bottom). By definition, both couplings have $p$ and $q$ as their marginals. As they are maximal couplings, their probability mass on the diagonal is identical and is the maximum among all valid couplings.
  • Figure 3: In each figure, the $y$ axis corresponds to the number of evaluations of the target model. Without speculative sampling, we evaluate the target model with $200$ steps and show the improvements obtained using our approach. Each dotted line corresponds to INDEPENDENT drafting, each solid line to FROZEN drafting. The color gradient purple to yellow corresponds to different dimensions of the target distribution $[2,4,8,16,32]$.
  • Figure 4: Evolution of the rejection probability ($y$-axis) with the dimension $d$ ($x$-axis) for $\sigma_1 = 0.2$ and $\sigma_2 = 0.1$
  • Figure 5: Effect of the temperature on the distribution of $Y$. Draft model has mean $1.0$ and standard deviation $0.5$. Target model has mean $3.0$ and standard deviation $0.5$.
  • ...and 1 more figures

Theorems & Definitions (23)

  • proposition 1
  • proposition 2
  • proposition 3
  • lemma 1
  • theorem 1
  • proposition 4
  • proof
  • lemma 2
  • proof
  • lemma 3
  • ...and 13 more