Table of Contents
Fetching ...

TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

Tianrong Chen, Huangjie Zheng, David Berthelot, Jiatao Gu, Josh Susskind, Shuangfei Zhai

TL;DR

Diffusion models remain limited by slow sampling, which motivates training-free acceleration strategies. The authors introduce Training-free Augmented DynAmics (TADA), a momentum-diffusion-based approach that uses higher-dimensional initial noise to enable faster sampling with pretrained models via an ODE solver, while offering a tunable detail control at no extra cost. They prove a training-equivalence between momentum diffusion and conventional diffusion, enabling direct reuse of pretrained models, and demonstrate strong, consistent gains on EDM/EDM2 and Stable Diffusion 3 across ImageNet benchmarks with up to 186% speedups. Empirical results show improved FID/FD-DINOv2 scores across NFEs, with qualitative improvements at lower CFG and under-parameterized regimes; limitations include incomplete disentanglement of augmentation and dynamics and diminishing gains for high-capacity models, guiding future work toward advanced solvers and stochasticity control.

Abstract

Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to $186\%$ faster than the current state of the art solver for comparative FID on ImageNet512. This new sampling method is training-free and uses an ordinary differential equation (ODE) solver. The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples with less function evaluations from existing pretrained diffusion models. In addition, by design our solver allows to control the level of detail through a simple hyper-parameter at no extra computational cost. We present how our approach leverages momentum dynamics by establishing a fundamental equivalence between momentum diffusion models and conventional diffusion models with respect to their training paradigms. Moreover, we observe the use of higher-dimensional noise naturally exhibits characteristics similar to stochastic differential equations (SDEs). Finally, we demonstrate strong performances on a set of representative pretrained diffusion models, including EDM, EDM2, and Stable-Diffusion 3, which cover models in both pixel and latent spaces, as well as class and text conditional settings. The code is available at https://github.com/apple/ml-tada.

TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics

TL;DR

Diffusion models remain limited by slow sampling, which motivates training-free acceleration strategies. The authors introduce Training-free Augmented DynAmics (TADA), a momentum-diffusion-based approach that uses higher-dimensional initial noise to enable faster sampling with pretrained models via an ODE solver, while offering a tunable detail control at no extra cost. They prove a training-equivalence between momentum diffusion and conventional diffusion, enabling direct reuse of pretrained models, and demonstrate strong, consistent gains on EDM/EDM2 and Stable Diffusion 3 across ImageNet benchmarks with up to 186% speedups. Empirical results show improved FID/FD-DINOv2 scores across NFEs, with qualitative improvements at lower CFG and under-parameterized regimes; limitations include incomplete disentanglement of augmentation and dynamics and diminishing gains for high-capacity models, guiding future work toward advanced solvers and stochasticity control.

Abstract

Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to faster than the current state of the art solver for comparative FID on ImageNet512. This new sampling method is training-free and uses an ordinary differential equation (ODE) solver. The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples with less function evaluations from existing pretrained diffusion models. In addition, by design our solver allows to control the level of detail through a simple hyper-parameter at no extra computational cost. We present how our approach leverages momentum dynamics by establishing a fundamental equivalence between momentum diffusion models and conventional diffusion models with respect to their training paradigms. Moreover, we observe the use of higher-dimensional noise naturally exhibits characteristics similar to stochastic differential equations (SDEs). Finally, we demonstrate strong performances on a set of representative pretrained diffusion models, including EDM, EDM2, and Stable-Diffusion 3, which cover models in both pixel and latent spaces, as well as class and text conditional settings. The code is available at https://github.com/apple/ml-tada.

Paper Structure

This paper contains 36 sections, 2 theorems, 90 equations, 10 figures, 2 tables, 1 algorithm.

Key Result

Proposition 3.1

The training objective of general Momentum Diffusion Models (MDM) (i.e., eq. eq:momentum-obj) can be equivalently reparameterized as:

Figures (10)

  • Figure 1: Here we show the distinctions between conventional diffusion models (left) and momentum diffusion models (right) during sampling. Leveraging Prop \ref{['prop:mdm-training']}, we demonstrate that pretrained diffusion models with $x_0$-prediction$x_{\theta}(\cdot,\cdot)$ can be directly applied to propagate the momentum system with multiple varible $\{{\mathbf{x}}_{t_i}^(n)\}_0^{N-1}$. Moreover, the choice of numerical solver remains flexible. See Sec \ref{['sec:N-var-sampling']} for details.
  • Figure 2: Demonstration of differences of TADA and probablistic ODE in terms of trajectories. We apply the same pretrained model with momentum system with same discretization in terms of SNR and the same initial prior samples. TADA can generate SDE-like property, such as generate different samples from same initial condition, but the system keeps deterministic ODE which can be solved more efficiently. See Prop.\ref{['prop:y-dyn']} for more detail.
  • Figure 3: Generated samples under varying prior scales are shown, all initialized with the same initial condition of dynamics in Prop. \ref{['prop:y-dyn']}:$y_0:=({\mathbf{r}}_0^\mathsf{T} \otimes {\bm{I}}_d){\mathbf{x}}_0 \equiv \epsilon$, using an identical time discretization over the SNR and the same pretrained model with 15 NFEs. It can be observed that the diversity of the generated results increases proportionally with the standard deviation of the final variable $x_0^{(N-1)}$, which is scaled by a factor $k$: ${\boldsymbol{\Sigma}}_0=\text{diag}(1,1,...,k)$.
  • Figure 4: Qualitative comparison with UniPC, varying the NFEs, using the same initial condition and the same pretrained EDM2 model. More qualitative comparision can be found in Appendix. \ref{['app:additional-comparision']}.
  • Figure 5: left: Comparison with baselines on ImageNet-64 using EDM pretrained model. Right: Performance under varying numbers of variables $N$ while keeping the SNR-based time discretization same to the $N = 2$ setting;$N = 2$ is the default configuration reported throughout this paper.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Proposition 3.1
  • proof
  • Remark 3.2
  • Proposition 3.3
  • proof
  • Remark 3.4
  • proof
  • proof
  • proof