Table of Contents
Fetching ...

TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration

Haowei Zhu, Tingxuan Huang, Xing Wang, Tianyu Zhao, Jiexi Wang, Weifeng Chen, Xurui Peng, Fangmin Chen, Junhai Yong, Bin Wang

TL;DR

Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step, substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.

Abstract

Diffusion models achieve strong generative performance but remain slow at inference due to the need for repeated full-model denoising passes. We present Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step. TAP uses a single full evaluation of the model's first layer as a low-cost probe to compute proxy losses for a compact family of candidate predictors (instantiated primarily with Taylor expansions of varying order and horizon), then assigns each token the predictor with the smallest proxy error. This per-token "probe-then-select" strategy exploits heterogeneous temporal dynamics, requires no additional training, and is compatible with various predictor designs. TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss. Extensive experiments across multiple diffusion architectures and generation tasks show that TAP substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.

TAP: A Token-Adaptive Predictor Framework for Training-Free Diffusion Acceleration

TL;DR

Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step, substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.

Abstract

Diffusion models achieve strong generative performance but remain slow at inference due to the need for repeated full-model denoising passes. We present Token-Adaptive Predictor (TAP), a training-free, probe-driven framework that adaptively selects a predictor for each token at every sampling step. TAP uses a single full evaluation of the model's first layer as a low-cost probe to compute proxy losses for a compact family of candidate predictors (instantiated primarily with Taylor expansions of varying order and horizon), then assigns each token the predictor with the smallest proxy error. This per-token "probe-then-select" strategy exploits heterogeneous temporal dynamics, requires no additional training, and is compatible with various predictor designs. TAP incurs negligible overhead while enabling large speedups with little or no perceptual quality loss. Extensive experiments across multiple diffusion architectures and generation tasks show that TAP substantially improves the accuracy-efficiency frontier compared to fixed global predictors and caching-only baselines.
Paper Structure (51 sections, 10 equations, 13 figures, 8 tables, 1 algorithm)

This paper contains 51 sections, 10 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the TAP framework. (a) Compute and cache: at the first step of each $N$-step window TAP performs a full evaluation and caches the input--output residual and the first-layer modulated input for later probing. (b) Taylor predictor family: we construct a compact set of predictors by varying Taylor expansion order and prediction horizon to cover diverse token dynamics. (c) Probe-then-select: the cached modulated input is used as a lightweight probe to score candidate predictors per token; the selected predictor's output replaces the full model computation for that token and step.
  • Figure 2: Ablation on Taylor predictor family. We investigate the influence of distance range and order configuration in TAP with $N=7$ on FLUX.1-dev. In (a), we fix the prediction distance as $k$, and in (b), we set the order as 2 and $\delta=1$.
  • Figure 3: Visualization results. On FLUX.1-dev, TAP delivers higher speedup without quality loss.
  • Figure 4: Visualization of video generation. "A happy fuzzy panda playing guitar nearby a campfire, snow mountain in the background".
  • Figure 5: Pareto plots on FLUX.1-dev evaluated on DrawBench.
  • ...and 8 more figures