Table of Contents
Fetching ...

ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor

Yixiao Wang, Ting Jiang, Zishan Shao, Hancheng Ye, Jingwei Sun, Mingyuan Ma, Jianyi Zhang, Yiran Chen, Hai Li

Abstract

Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required during sampling. Training-free acceleration methods reduce latency by either sparsifying the model architecture or shortening the sampling trajectory. Current training-free acceleration methods are more complex than necessary: higher-order predictors amplify error under aggressive speedups, and architectural modifications hinder deployment. Beyond 2x acceleration, step skipping creates structural scarcity -- at most one fresh evaluation per local window -- leaving the computed output and its backward difference as the only causally grounded information. Based on this, we propose ZEUS, an acceleration method that predicts reduced denoiser evaluations using a second-order predictor, and stabilizes aggressive consecutive skipping with an interleaved scheme that avoids back-to-back extrapolations. ZEUS adds essentially zero overhead, no feature caches, and no architectural modifications, and it is compatible with different backbones, prediction objectives, and solver choices. Across image and video generation, ZEUS consistently improves the speed-fidelity performance over recent training-free baselines, achieving up to 3.2x end-to-end speedup while maintaining perceptual quality. Our code is available at: https://github.com/Ting-Justin-Jiang/ZEUS.

ZEUS: Accelerating Diffusion Models with Only Second-Order Predictor

Abstract

Denoising generative models deliver high-fidelity generation but remain bottlenecked by inference latency due to the many iterative denoiser calls required during sampling. Training-free acceleration methods reduce latency by either sparsifying the model architecture or shortening the sampling trajectory. Current training-free acceleration methods are more complex than necessary: higher-order predictors amplify error under aggressive speedups, and architectural modifications hinder deployment. Beyond 2x acceleration, step skipping creates structural scarcity -- at most one fresh evaluation per local window -- leaving the computed output and its backward difference as the only causally grounded information. Based on this, we propose ZEUS, an acceleration method that predicts reduced denoiser evaluations using a second-order predictor, and stabilizes aggressive consecutive skipping with an interleaved scheme that avoids back-to-back extrapolations. ZEUS adds essentially zero overhead, no feature caches, and no architectural modifications, and it is compatible with different backbones, prediction objectives, and solver choices. Across image and video generation, ZEUS consistently improves the speed-fidelity performance over recent training-free baselines, achieving up to 3.2x end-to-end speedup while maintaining perceptual quality. Our code is available at: https://github.com/Ting-Justin-Jiang/ZEUS.

Paper Structure

This paper contains 58 sections, 18 theorems, 104 equations, 5 figures, 9 tables.

Key Result

Theorem 1.2

Let $\ell(a,b) = \|a-b\|_2^2$. Define the population risk Then any minimizer $\psi^*$ of $\mathcal{L}$ satisfies, for all $(\mathbf{x}_s,s)$, In other words, in the $L^2$ sense, training recovers the conditional expectation of the regression target $\psi_0$ given the noisy input $(\mathbf{x}_s,s)$. Let the hypothesis class be all measurable maps with finite second moment; equivalently, consider

Figures (5)

  • Figure 1: Overview of ZEUS. ZEUS is a training-free acceleration framework for ODE-based generative models with four key properties: (i) modality-agnostic—applicable to image, video generation; (ii) parameterization-agnostic—compatible with $\epsilon$-, $v$-, and flow-prediction objectives; (iii) state-of-the-art speed–fidelity tradeoff—outperforming more complex methods with minimal overhead; (iv) minimal integration effort—fewer than 20 lines of code. See Section 4 for details.
  • Figure 2: Scarcity of full evaluations. Under an aggressive acceleration ratio, we have limited denoiser evaluations, creating a scarcity of real information. In this paper, we find that the (executed) denoising trajectory yields the observed, path-wise information set $\{\psi_t,\ \Delta^{(1)}\psi_t\}$, where $\Delta^{(1)}\psi_t=\psi_t-\hat{\psi}_{t+1}$.
  • Figure 3: ZEUS ablations on SDXL. We generate 1,000 samples from random MS-COCO 2017 prompts using DPM-Solver++ (50 steps). (a,b) Predictor order under a uniform $1{:}1$ schedule.(c,d) Stability from reusing the observed information set under a uniform $1{:}3$ schedule. This shows that ZEUS remains stable as the reduced-step run length increases.
  • Figure 4: Three approximation schemes. We compare reuse-only, predictor-only, and ZEUS (reuse of the observed information pair). Dark gray: reference trajectory $\psi^{\star}$. Light gray: solver-computed outputs $\psi_t$. Crimson: approximated segments. Left--Reuse only: numerically stable but limited in expressivity; fine details erode (bottom). Middle--Predictor only: chaining second-order extrapolations overshoots without re-anchoring, producing artifacts (bottom). Right--Reuse observed information (ZEUS): alternating reuse of $\{\psi_t,\ \psi_t+\Delta^{(1)}\psi_t\}$ prevents overshoot and preserves detail, yielding the best perceptual quality.
  • Figure A.1: Illustration of the Runge phenomenon. Left: polynomial interpolation with only 2 nodes. Right: interpolation with 10 nodes, which exhibits severe oscillations and divergence near the boundary.

Theorems & Definitions (38)

  • Definition 1.1
  • Theorem 1.2: Optimal predictor under $L^2$ training
  • proof
  • Theorem 1.3: Equivalence of parameterizations
  • proof
  • Theorem 1.4: Signal–noise decomposition and invariance
  • proof
  • Theorem 1.5: Second-order backward extrapolation is BLUE and second-order accurate
  • proof
  • Remark 1.6: On conditioning in Theorem A.4: $\phi(s)$ is a population-level trend
  • ...and 28 more