RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
Sudarshan Rajagopalan, Kartik Narayan, Vishal M. Patel
TL;DR
The paper addresses the slow inference of latent diffusion model–based AiOR methods by introducing RestoreVAR, a Visual Autoregressive Modeling (VAR) approach tailored for all-in-one restoration. By exploiting scale-space autoregression, RestoreVAR achieves competitive restoration quality while delivering over 10x faster inference than LDM-based methods, aided by cross-attention conditioning on degraded latents and a lightweight latent refiner coupled with VAE decoder fine-tuning. A key insight is that degradations concentrate in coarse VAR scales and scene details reside in fine scales, enabling efficient, semantically coherent restoration. Extensive experiments across haze, snow, rain, low-light, and blur demonstrate state-of-the-art performance among generative AiOR methods and strong generalization to real-world degradations, with ablations highlighting the importance of continuous latent conditioning and the refiner components.
Abstract
The use of latent diffusion models (LDMs) such as Stable Diffusion has significantly improved the perceptual quality of All-in-One image Restoration (AiOR) methods, while also enhancing their generalization capabilities. However, these LDM-based frameworks suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications. Visual autoregressive modeling (VAR), a recently introduced approach for image generation, performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers with drastically reduced computational costs. Moreover, our analysis reveals that coarse scales in VAR primarily capture degradations while finer scales encode scene detail, simplifying the restoration process. Motivated by this, we propose RestoreVAR, a novel VAR-based generative approach for AiOR that significantly outperforms LDM-based models in restoration performance while achieving over $10\times$ faster inference. To optimally exploit the advantages of VAR for AiOR, we propose architectural modifications and improvements, including intricately designed cross-attention mechanisms and a latent-space refinement module, tailored for the AiOR task. Extensive experiments show that RestoreVAR achieves state-of-the-art performance among generative AiOR methods, while also exhibiting strong generalization capabilities.
