Table of Contents
Fetching ...

ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization

Jiayu Chen, Ruoyu Lin, Zihao Zheng, Jingxin Li, Maoliang Li, Guojie Luo, Xiang chen

TL;DR

A novel optimization framework for VAR models that fundamentally differs from prior approaches such as FastVAR and SkipVAR is presented, which achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details, outperforming traditional methods in both efficiency and quality.

Abstract

Visual Autoregressive(VAR) models enhance generation quality but face a critical efficiency bottleneck in later stages. In this paper, we present a novel optimization framework for VAR models that fundamentally differs from prior approaches such as FastVAR and SkipVAR. Instead of relying on heuristic skipping strategies, our method leverages attention entropy to characterize the semantic projections across different dimensions of the model architecture. This enables precise identification of parameter dynamics under varying token granularity levels, semantic scopes, and generation scales. Building on this analysis, we further uncover sparsity patterns along three critical dimensions-token, layer, and scale-and propose a set of fine-grained optimization strategies tailored to these patterns. Extensive evaluation demonstrates that our approach achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details, outperforming traditional methods in both efficiency and quality. Experiments on Infinity-2B and Infinity-8B models demonstrate that ToProVAR achieves up to 3.4x acceleration with minimal quality loss, effectively mitigating the issues found in prior work. Our code will be made publicly available.

ToProVAR: Efficient Visual Autoregressive Modeling via Tri-Dimensional Entropy-Aware Semantic Analysis and Sparsity Optimization

TL;DR

A novel optimization framework for VAR models that fundamentally differs from prior approaches such as FastVAR and SkipVAR is presented, which achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details, outperforming traditional methods in both efficiency and quality.

Abstract

Visual Autoregressive(VAR) models enhance generation quality but face a critical efficiency bottleneck in later stages. In this paper, we present a novel optimization framework for VAR models that fundamentally differs from prior approaches such as FastVAR and SkipVAR. Instead of relying on heuristic skipping strategies, our method leverages attention entropy to characterize the semantic projections across different dimensions of the model architecture. This enables precise identification of parameter dynamics under varying token granularity levels, semantic scopes, and generation scales. Building on this analysis, we further uncover sparsity patterns along three critical dimensions-token, layer, and scale-and propose a set of fine-grained optimization strategies tailored to these patterns. Extensive evaluation demonstrates that our approach achieves aggressive acceleration of the generation process while significantly preserving semantic fidelity and fine details, outperforming traditional methods in both efficiency and quality. Experiments on Infinity-2B and Infinity-8B models demonstrate that ToProVAR achieves up to 3.4x acceleration with minimal quality loss, effectively mitigating the issues found in prior work. Our code will be made publicly available.
Paper Structure (33 sections, 33 equations, 13 figures, 10 tables, 2 algorithms)

This paper contains 33 sections, 33 equations, 13 figures, 10 tables, 2 algorithms.

Figures (13)

  • Figure 1: A comparison between our method and state-of-the-art compression methods. The SOTA methods often suffer from issues such as semantic loss, structure distortion, and detail collapse.
  • Figure 2: Different Optimization Dimensions -- FastVAR vs. SkipVAR vs. ToProVAR
  • Figure 3: Tri-dimensional attention entropy analysis in VAR models: (a) Token-Level Semantic Salience: Pruning low-saliency tokens preserves quality, while pruning high-saliency ones causes severe degradation. (b) Layer-level Semantic Representation: Global Layers capture structure and are pruning-sensitive, whereas detail layers refine local semantics and can be pruned. (c) Scale-level Semantic Depth: complex objects require deeper scales for fine details, while simple ones stabilize earlier, enabling adaptive depth pruning.
  • Figure 4: Tri-Dimensional Entropy-Aware VAR Sparsity Optimization: (a) Scale-level: compute the low-entropy ratio $\rho_s$ across scales and select the pruning start depth via threshold $\tau$. (b) Layer-level: for each scale, perform SVD on the entropy map to separate Global layers from Detail layers. (c) Token-level: within prunable layers/scales, increase pruning rate with scale and use entropy-based gating $p_{\text{prune}}$ to remove low-salience tokens, preserving salient regions.
  • Figure 5: Qualitative comparison of various methods on complex prompts. Our method effectively prevents semantic loss, structure distortion, and detail collapse while maintaining high visual fidelity.
  • ...and 8 more figures