Table of Contents
Fetching ...

SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation

Yao Teng, Zhihuan Jiang, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

TL;DR

This work tackles the slow inference of autoregressive text-to-image models by introducing Speculative Jacobi Decoding++ (SJD++), a training-free probabilistic parallel decoding framework that performs multi-token predictions per forward pass and uses speculative drafting-and-verification with token reuse. It couples Jacobi-style iterative updates with probabilistic acceptance and selective token reuse, augmented by spatial-prior initialization to exploit image locality. Across multiple models and benchmarks, SJD++ achieves 2×–3× latency reductions and 2×–7× step compression without observable degradation in visual quality or semantic alignment, with notable gains from the token reuse mechanism. The approach offers a practical, model-agnostic acceleration technique for large autoregressive T2I systems and suggests directions for training-integrated or video-extension future work.

Abstract

Large autoregressive models can generate high-quality, high-resolution images but suffer from slow generation speed, because these models require hundreds to thousands of sequential forward passes for next-token prediction during inference. To accelerate autoregressive text-to-image generation, we propose Speculative Jacobi Decoding++ (SJD++), a training-free probabilistic parallel decoding algorithm. Unlike traditional next-token prediction, SJD++ performs multi-token prediction in each forward pass, drastically reducing generation steps. Specifically, it integrates the iterative multi-token prediction mechanism from Jacobi decoding, with the probabilistic drafting-and-verification mechanism from speculative sampling. More importantly, for further acceleration, SJD++ reuses high-confidence draft tokens after each verification phase instead of resampling them all. We conduct extensive experiments on several representative autoregressive text-to-image generation models and demonstrate that SJD++ achieves $2\times$ to $3\times$ inference latency reduction and $2\times$ to $7\times$ step compression, while preserving visual quality with no observable degradation.

SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation

TL;DR

This work tackles the slow inference of autoregressive text-to-image models by introducing Speculative Jacobi Decoding++ (SJD++), a training-free probabilistic parallel decoding framework that performs multi-token predictions per forward pass and uses speculative drafting-and-verification with token reuse. It couples Jacobi-style iterative updates with probabilistic acceptance and selective token reuse, augmented by spatial-prior initialization to exploit image locality. Across multiple models and benchmarks, SJD++ achieves 2×–3× latency reductions and 2×–7× step compression without observable degradation in visual quality or semantic alignment, with notable gains from the token reuse mechanism. The approach offers a practical, model-agnostic acceleration technique for large autoregressive T2I systems and suggests directions for training-integrated or video-extension future work.

Abstract

Large autoregressive models can generate high-quality, high-resolution images but suffer from slow generation speed, because these models require hundreds to thousands of sequential forward passes for next-token prediction during inference. To accelerate autoregressive text-to-image generation, we propose Speculative Jacobi Decoding++ (SJD++), a training-free probabilistic parallel decoding algorithm. Unlike traditional next-token prediction, SJD++ performs multi-token prediction in each forward pass, drastically reducing generation steps. Specifically, it integrates the iterative multi-token prediction mechanism from Jacobi decoding, with the probabilistic drafting-and-verification mechanism from speculative sampling. More importantly, for further acceleration, SJD++ reuses high-confidence draft tokens after each verification phase instead of resampling them all. We conduct extensive experiments on several representative autoregressive text-to-image generation models and demonstrate that SJD++ achieves to inference latency reduction and to step compression, while preserving visual quality with no observable degradation.

Paper Structure

This paper contains 19 sections, 16 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: We propose Speculative Jacobi Decoding++, a training-free multi-token prediction algorithm, to accelerate autoregressive text-to-image generation by reducing the number of model forward passes (denoted as steps) during inference. We perform our algorithm on Lumina-mGPT, and the reduced steps are marked in red. The original steps are marked in black.
  • Figure 2: Comparison of image diversity under different sampling strategies for Lumina-mGPT liu2024lumina-mgpt. Each row shows images generated with the same random seeds using greedy decoding (no randomness), top-$10$, top-$100$, and top-$2000$ sampling. Greedy decoding produces repetitive and less diverse outputs, while larger $K$ values yield richer textures, colors, and scene variations, highlighting the importance of controlled sampling randomness for high-quality image generation.
  • Figure 3: The pipeline of the vanilla Jacobi decoding on an autoregressive model. The prediction with sampling is performed in parallel at each Jacobi iteration. We use different shades of blue to indicate the differences between the tokens that have not been accepted.
  • Figure 4: Overview of one iteration of speculative Jacobi decoding (SJD++). First, a sequence of draft tokens and their corresponding probabilities are provided as input. Second, the autoregressive model performs a single forward pass to obtain updated conditional probabilities. Third, speculative verification accepts a subset of tokens based on a probabilistic criterion and resamples the remaining ones. Finally, accepted tokens are appended to the prefix sequence, while unaccepted tokens, combined with newly initialized tokens, form the next draft sequence for the following iteration.
  • Figure 5: SJD++ beats the vanilla Jacobi decoding under various sampling randomness.
  • ...and 9 more figures