Table of Contents
Fetching ...

CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration

Keming Ye, Zhou Zhao, Fan Wu, Shengyu Zhang

Abstract

Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework \textbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: \textit{the vast token vocabulary} required for high-fidelity images and \textit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual vocabularies instead of conventional discrete solution sets. Additionally, we incorporate a Interval-enhanced decoding module to further speed up decoding while maintaining visual fidelity and semantic consistency via a distribution alignment training strategy. Extensive experiments demonstrate that CIAR achieves a 2.18x speed-up and reduces cloud requests by 70\%, while preserving image quality compared to existing methods.

CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration

Abstract

Auto-regressive (AR) models have recently made notable progress in image generation, achieving performance comparable to diffusion-based approaches. However, their computational intensity and sequential nature impede on-device deployment, causing disruptive latency. We address this via a cloud-device collaboration framework \textbf{CIAR}, which utilizes on-device self-verification to handle two key properties of visual synthesis: \textit{the vast token vocabulary} required for high-fidelity images and \textit{inherent spatial redundancy} which leads to extreme predictability in homogeneous regions, while object boundaries exhibit high uncertainty. Uniform verification wastes resources on such redundant tokens. Our solution centers on an on-device token uncertainty quantifier, which adopts continuous probability intervals to accelerate processing and make it feasible for large visual vocabularies instead of conventional discrete solution sets. Additionally, we incorporate a Interval-enhanced decoding module to further speed up decoding while maintaining visual fidelity and semantic consistency via a distribution alignment training strategy. Extensive experiments demonstrate that CIAR achieves a 2.18x speed-up and reduces cloud requests by 70\%, while preserving image quality compared to existing methods.

Paper Structure

This paper contains 48 sections, 2 theorems, 30 equations, 7 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

Let $\mathcal{P}:=\{p\in\mathbb{R}^{n}:p_{i}^{\ell}\le p_{i}\le p_{i}^{u},\;\sum_{i}p_{i}=1\}$ be the feasible polytope of true discrete distributions consistent with the interval bounds. Then for any $p,q\in\mathcal{P}$, Consequently $\Omega$ upper-bounds the $L_{1}$-diameter of $\mathcal{P}$.

Figures (7)

  • Figure 1: (a) Acceptance analysis of Lantern. The pie chart shows the ratio of max-prob vs. other tokens, and the bar chart compares Lantern without verification to the baseline. (b) Comparison of decoding frameworks. From left to right: baseline, Lantern, and our CIAR with Inter-Head and cloud-device collaboration, which reduces latency while preserving output quality.
  • Figure 2: Overview of CIAR. (a) The cloud-side AR model generates image token prefixes from the input prompt. These prefixes are then sent to (b) a lightweight device model with Inter-Head accepts confident tokens locally and sends uncertain ones with interval features to the cloud for verification and distribution alignment. (c) Interval-Based Alignment Strategy. (d) Computation of uncertainty intervals in the Inter-Head.
  • Figure 3: Comparison of different prefix rate
  • Figure 4: Visual analysis of different methods. "CIAR w/o Inter-Enhance" denotes our method without interval-enhanced decoding using interval features.
  • Figure 5: Latency comparison between Discrete and Continuous methods.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Claim 1: Non-negativity and Zeros
  • proof
  • Claim 2: Scaling
  • proof
  • Claim 3: Upper Bound in Terms of $\Omega$
  • proof
  • Claim 4: Upper Bound in Terms of $\sum\delta_{i}^{2}$
  • proof
  • Proposition 1: Feasible Set Diameter
  • proof
  • ...and 2 more