Table of Contents
Fetching ...

Navigating Image Restoration with VAR's Distribution Alignment Prior

Siyang Wang, Feng Zhao

TL;DR

This work forms the multi-scale latent representations within VAR as the restoration prior, thus advancing the delicately designed VarFormer framework and demonstrating that the VarFormer outperforms existing multi-task image restoration methods across various restoration tasks.

Abstract

Generative models trained on extensive high-quality datasets effectively capture the structural and statistical properties of clean images, rendering them powerful priors for transforming degraded features into clean ones in image restoration. VAR, a novel image generative paradigm, surpasses diffusion models in generation quality by applying a next-scale prediction approach. It progressively captures both global structures and fine-grained details through the autoregressive process, consistent with the multi-scale restoration principle widely acknowledged in the restoration community. Furthermore, we observe that during the image reconstruction process utilizing VAR, scale predictions automatically modulate the input, facilitating the alignment of representations at subsequent scales with the distribution of clean images. To harness VAR's adaptive distribution alignment capability in image restoration tasks, we formulate the multi-scale latent representations within VAR as the restoration prior, thus advancing our delicately designed VarFormer framework. The strategic application of these priors enables our VarFormer to achieve remarkable generalization on unseen tasks while also reducing training computational costs. Extensive experiments underscores that our VarFormer outperforms existing multi-task image restoration methods across various restoration tasks.

Navigating Image Restoration with VAR's Distribution Alignment Prior

TL;DR

This work forms the multi-scale latent representations within VAR as the restoration prior, thus advancing the delicately designed VarFormer framework and demonstrating that the VarFormer outperforms existing multi-task image restoration methods across various restoration tasks.

Abstract

Generative models trained on extensive high-quality datasets effectively capture the structural and statistical properties of clean images, rendering them powerful priors for transforming degraded features into clean ones in image restoration. VAR, a novel image generative paradigm, surpasses diffusion models in generation quality by applying a next-scale prediction approach. It progressively captures both global structures and fine-grained details through the autoregressive process, consistent with the multi-scale restoration principle widely acknowledged in the restoration community. Furthermore, we observe that during the image reconstruction process utilizing VAR, scale predictions automatically modulate the input, facilitating the alignment of representations at subsequent scales with the distribution of clean images. To harness VAR's adaptive distribution alignment capability in image restoration tasks, we formulate the multi-scale latent representations within VAR as the restoration prior, thus advancing our delicately designed VarFormer framework. The strategic application of these priors enables our VarFormer to achieve remarkable generalization on unseen tasks while also reducing training computational costs. Extensive experiments underscores that our VarFormer outperforms existing multi-task image restoration methods across various restoration tasks.
Paper Structure (14 sections, 11 equations, 7 figures, 5 tables)

This paper contains 14 sections, 11 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Motivation of VarFormer. (1) As the autoregressive scale evolves, VAR's multi-scale representations shift focus from capturing global patterns at lower scales to highlighting fine-grained details at higher scales. (2) VAR's scale predictions adaptively modulate the input to align with the distribution of clean images. Utilizing VAR's alignment prior on varied scale features related to degradation types allows us to eliminate associated degradations.
  • Figure 2: Illustration of our investigation about the multi-scale distribution alignment priors within VAR. (a) Reconstruction Training: To ensure the coherence of teacher-forcing-based autoregressive predictions, we employ cross-attention to inject multi-scale embeddings ${F}_{e}$ obtained from the VQVAE encoder into the scale autoregression Transformer in VAR tian2024visual for image reconstruction pre-training. During this process, we freeze the VAR and train only the cross-attention mechanism using clean images. (b) Reconstruction Testing: We feed degraded images into the trained VAR to obtain outputs from the VQVAE encoder, denoted as ${F}_{e}^{deg}$, and the multi-scale predictions from the VAR Transformer, denoted as ${F}_{v}^{deg}$. (c-d) Observation: By partially replacing ${F}_{e}^{deg}$ with ${F}_{v}^{deg}$ and mapping the modified multi-scale features back to the pixel space through the decoder, the disappearance of various degradations occurs when replacing different scale features, demonstrating a transition from capturing global information at lower scales to focusing on fine-grained details at higher scales.
  • Figure 3: The t-SNE diagrams demonstrate that VAR's next-scale prediction can reduce the gap between degraded and clean images, effectively aligning their distributions.
  • Figure 4: The framework of our VarFormer includes two training stages. Stage 1: To preserve the inherent knowledge of VAR and further enhance its adaptive distribution alignment capability, we freeze the VAR and integrate an Adapter to deliberately reduce the distance between the multi-scale latent representations of clean and degraded images, thereby obtaining multi-scale distribution alignment embedding ${S}_{v}$. Stage 2: To adaptively extract valuable VAR scale priors for input-specific degradation type, the Degradation-Aware Enhancement (DAE) module is designed to distinguish different degradation types and integrate relevant priors, thus providing effective scale-aware alignment prior for the restoration process. Furthermore, the Adaptive Feature Transformation (AFT) module integrates the VAR scale priors into the image restoration network to guide the elimination of degradation.
  • Figure 5: Visual comparison with state-of-the-art methods on image deraining task. Please zoom in for details.
  • ...and 2 more figures