Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens

Ziran Qin; Youru Lv; Mingbao Lin; Zeren Zhang; Chanfan Gan; Tieyuan Chen; Weiyao Lin

Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens

Ziran Qin, Youru Lv, Mingbao Lin, Zeren Zhang, Chanfan Gan, Tieyuan Chen, Weiyao Lin

TL;DR

<3-5 sentence high-level summary> LineAR tackles the memory bottleneck in autoregressive image generation by proposing a training-free, progressive KV cache compression that treats cache as a 2D raster-line structure. It preserves essential initial anchors and recent lines while progressively evicting less informative tokens from a mid-region under inter-line guidance, keeping the cache within a fixed budget. Across six AR visual models and multiple tasks, LineAR achieves substantial memory reductions and throughput speedups with maintained or improved generation quality. The method leverages local visual dependencies and strong inter-line attention consistency to enable safe, line-by-line cache compression without retraining. These results demonstrate practical improvements in deployment scalability for AR-based multimodal generation systems.

Abstract

Autoregressive (AR) visual generation has emerged as a powerful paradigm for image and multimodal synthesis, owing to its scalability and generality. However, existing AR image generation suffers from severe memory bottlenecks due to the need to cache all previously generated visual tokens during decoding, leading to both high storage requirements and low throughput. In this paper, we introduce \textbf{LineAR}, a novel, training-free progressive key-value (KV) cache compression pipeline for autoregressive image generation. By fully exploiting the intrinsic characteristics of visual attention, LineAR manages the cache at the line level using a 2D view, preserving the visual dependency regions while progressively evicting less-informative tokens that are harmless for subsequent line generation, guided by inter-line attention. LineAR enables efficient autoregressive (AR) image generation by utilizing only a few lines of cache, achieving both memory savings and throughput speedup, while maintaining or even improving generation quality. Extensive experiments across six autoregressive image generation models, including class-conditional and text-to-image generation, validate its effectiveness and generality. LineAR improves ImageNet FID from 2.77 to 2.68 and COCO FID from 23.85 to 22.86 on LlamaGen-XL and Janus-Pro-1B, while retaining only 1/6 KV cache. It also improves DPG on Lumina-mGPT-768 with just 1/8 KV cache. Additionally, LineAR achieves significant memory and throughput gains, including up to 67.61% memory reduction and 7.57x speedup on LlamaGen-XL, and 39.66% memory reduction and 5.62x speedup on Janus-Pro-7B.

Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens

TL;DR

Abstract

Autoregressive Image Generation Needs Only a Few Lines of Cached Tokens

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)