Table of Contents
Fetching ...

RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration

Sudarshan Rajagopalan, Kartik Narayan, Vishal M. Patel

TL;DR

The paper addresses the slow inference of latent diffusion model–based AiOR methods by introducing RestoreVAR, a Visual Autoregressive Modeling (VAR) approach tailored for all-in-one restoration. By exploiting scale-space autoregression, RestoreVAR achieves competitive restoration quality while delivering over 10x faster inference than LDM-based methods, aided by cross-attention conditioning on degraded latents and a lightweight latent refiner coupled with VAE decoder fine-tuning. A key insight is that degradations concentrate in coarse VAR scales and scene details reside in fine scales, enabling efficient, semantically coherent restoration. Extensive experiments across haze, snow, rain, low-light, and blur demonstrate state-of-the-art performance among generative AiOR methods and strong generalization to real-world degradations, with ablations highlighting the importance of continuous latent conditioning and the refiner components.

Abstract

The use of latent diffusion models (LDMs) such as Stable Diffusion has significantly improved the perceptual quality of All-in-One image Restoration (AiOR) methods, while also enhancing their generalization capabilities. However, these LDM-based frameworks suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications. Visual autoregressive modeling (VAR), a recently introduced approach for image generation, performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers with drastically reduced computational costs. Moreover, our analysis reveals that coarse scales in VAR primarily capture degradations while finer scales encode scene detail, simplifying the restoration process. Motivated by this, we propose RestoreVAR, a novel VAR-based generative approach for AiOR that significantly outperforms LDM-based models in restoration performance while achieving over $10\times$ faster inference. To optimally exploit the advantages of VAR for AiOR, we propose architectural modifications and improvements, including intricately designed cross-attention mechanisms and a latent-space refinement module, tailored for the AiOR task. Extensive experiments show that RestoreVAR achieves state-of-the-art performance among generative AiOR methods, while also exhibiting strong generalization capabilities.

RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration

TL;DR

The paper addresses the slow inference of latent diffusion model–based AiOR methods by introducing RestoreVAR, a Visual Autoregressive Modeling (VAR) approach tailored for all-in-one restoration. By exploiting scale-space autoregression, RestoreVAR achieves competitive restoration quality while delivering over 10x faster inference than LDM-based methods, aided by cross-attention conditioning on degraded latents and a lightweight latent refiner coupled with VAE decoder fine-tuning. A key insight is that degradations concentrate in coarse VAR scales and scene details reside in fine scales, enabling efficient, semantically coherent restoration. Extensive experiments across haze, snow, rain, low-light, and blur demonstrate state-of-the-art performance among generative AiOR methods and strong generalization to real-world degradations, with ablations highlighting the importance of continuous latent conditioning and the refiner components.

Abstract

The use of latent diffusion models (LDMs) such as Stable Diffusion has significantly improved the perceptual quality of All-in-One image Restoration (AiOR) methods, while also enhancing their generalization capabilities. However, these LDM-based frameworks suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications. Visual autoregressive modeling (VAR), a recently introduced approach for image generation, performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers with drastically reduced computational costs. Moreover, our analysis reveals that coarse scales in VAR primarily capture degradations while finer scales encode scene detail, simplifying the restoration process. Motivated by this, we propose RestoreVAR, a novel VAR-based generative approach for AiOR that significantly outperforms LDM-based models in restoration performance while achieving over faster inference. To optimally exploit the advantages of VAR for AiOR, we propose architectural modifications and improvements, including intricately designed cross-attention mechanisms and a latent-space refinement module, tailored for the AiOR task. Extensive experiments show that RestoreVAR achieves state-of-the-art performance among generative AiOR methods, while also exhibiting strong generalization capabilities.

Paper Structure

This paper contains 16 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: RestoreVAR, our proposed VAR-based var scale-space generative AiOR model (a), significantly outperforms LDM-based methods as shown in (b). RestoreVAR also offers drastic reductions in computational complexity as shown in (c).
  • Figure 2: VAR captures degradations in early scales (coarse) and scene-level details in later scales (fine). Degraded and GT are VQVAE reconstructions of the degraded and ground truth images. GT$+$coarse replaces early GT scales with degraded ones, while GT$+$fine replaces the late GT scales.
  • Figure 3: Illustration of RestoreVAR for training and inference. (a) Shows the training procedure for each component of RestoreVAR, and (b) shows the overall pipeline during inference.
  • Figure 4: Illustration of images decoded from discrete and continuous latents, along with the refiner’s predicted residuals.
  • Figure 5: Qualitative comparisons of RestoreVAR with LDM-based generative AiOR approaches. RestoreVAR achieves consistent restoration with enhanced preservation of fine-details.
  • ...and 3 more figures