AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

Cencen Liu; Dongyang Zhang; Wen Yin; Jielei Wang; Tianyu Li; Ji Guo; Wenbo Jiang; Guoqing Wang; Guoming Lu

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

Cencen Liu, Dongyang Zhang, Wen Yin, Jielei Wang, Tianyu Li, Ji Guo, Wenbo Jiang, Guoqing Wang, Guoming Lu

TL;DR

AlignVAR is proposed, a globally consistent visual autoregressive framework tailored for ISR, featuring two key components: Spatial Consistency Autoregression (SCA), which applies an adaptive mask to reweight attention toward structurally correlated regions, thereby mitigating excessive locality and enhancing long-range dependencies.

Abstract

Visual autoregressive (VAR) models have recently emerged as a promising alternative for image generation, offering stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction. This encourages the exploration of VAR for image super-resolution (ISR), yet its application remains underexplored and faces two critical challenges: locality-biased attention, which fragments spatial structures, and residual-only supervision, which accumulates errors across scales, severely compromises global consistency of reconstructed images. To address these issues, we propose AlignVAR, a globally consistent visual autoregressive framework tailored for ISR, featuring two key components: (1) Spatial Consistency Autoregression (SCA), which applies an adaptive mask to reweight attention toward structurally correlated regions, thereby mitigating excessive locality and enhancing long-range dependencies; and (2) Hierarchical Consistency Constraint (HCC), which augments residual learning with full reconstruction supervision at each scale, exposing accumulated deviations early and stabilizing the coarse-to-fine refinement process. Extensive experiments demonstrate that AlignVAR consistently enhances structural coherence and perceptual fidelity over existing generative methods, while delivering over 10x faster inference with nearly 50% fewer parameters than leading diffusion-based approaches, establishing a new paradigm for efficient ISR.

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

TL;DR

Abstract

Paper Structure (32 sections, 13 equations, 10 figures, 5 tables)

This paper contains 32 sections, 13 equations, 10 figures, 5 tables.

Introduction
Related Works
Image super-resolution.
Visual autoregressive models.
Preliminaries
Methods
Motivation and Empirical Observation
Hierarchical inconsistency caused by error accumulation.
Overview
Spatial Consistency Autoregression (SCA)
Structure-aware reweighting field.
Hierarchical Consistency Constraint (HCC)
Full-scale representation.
HCC loss.
Training Objective
...and 17 more sections

Figures (10)

Figure 1: Comparison between the VARSR and AlignVAR. AlignVAR enhances VAR by introducing an adaptive consistency mask for intra-scale modeling and full reconstruction supervision for inter-scale alignment.
Figure 2: Comparison of attention distribution. Visualization of attention maps for VARSR and AlignVAR shows that VARSR exhibits highly localized attention concentrated in nearby regions, whereas AlignVAR captures broader contextual dependencies through the proposed Spatial Consistency Autoregression (SCA), thereby enhancing spatial coherence within each scale.
Figure 3: Spatial inconsistency results in texture discontinuities, structural distortions.
Figure 4: Hierarchical inconsistency results in color shifts and structural misalignment.
Figure 5: Overall architecture of the proposed AlignVAR. AlignVAR comprises two complementary components: a Spatial Consistency Autoregression (SCA) that performs scale-wise prediction and reweights intra-scale features using adaptive masks, and a Hierarchical Consistency Constraint (HCC) that jointly supervises residual and full representations to recalibrate inter-scale dependencies.
...and 5 more figures

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

TL;DR

Abstract

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (10)