Table of Contents
Fetching ...

VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions

Qianyi Shao, Yuanfan Zhang, Renxiang Xiao, Liang Hu

TL;DR

The paper tackles reliable image restoration under diverse adverse weather by introducing MVLR, a compact encoder–decoder framework augmented with a Visual-Language Model (VLM) that generates degradation priors and an Implicit Memory Bank (IMB) of degradation prototypes. The VLM-prior guides a transformer-based encoder through cross-attention, and the IMB retrieves relevant prototypes via cosine similarity to refine features, with a fusion mechanism that yields high-fidelity restoration. The approach outperforms single-branch and mixture-of-experts baselines on four severe-weather benchmarks in PSNR and SSIM, while maintaining efficiency suitable for real-time deployment. These results suggest MVLR's practical value for robust outdoor perception in autonomous systems and robotics.

Abstract

Reliable visual perception under adverse weather conditions, such as rain, haze, snow, or a mixture of them, is desirable yet challenging for autonomous driving and outdoor robots. In this paper, we propose a unified Memory-Enhanced Visual-Language Recovery (MVLR) model that restores images from different degradation levels under various weather conditions. MVLR couples a lightweight encoder-decoder backbone with a Visual-Language Model (VLM) and an Implicit Memory Bank (IMB). The VLM performs chain-of-thought inference to encode weather degradation priors and the IMB stores continuous latent representations of degradation patterns. The VLM-generated priors query the IMB to retrieve fine-grained degradation prototypes. These prototypes are then adaptively fused with multi-scale visual features via dynamic cross-attention mechanisms, enhancing restoration accuracy while maintaining computational efficiency. Extensive experiments on four severe-weather benchmarks show that MVLR surpasses single-branch and Mixture-of-Experts baselines in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). These results indicate that MVLR offers a practical balance between model compactness and expressiveness for real-time deployment in diverse outdoor conditions.

VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions

TL;DR

The paper tackles reliable image restoration under diverse adverse weather by introducing MVLR, a compact encoder–decoder framework augmented with a Visual-Language Model (VLM) that generates degradation priors and an Implicit Memory Bank (IMB) of degradation prototypes. The VLM-prior guides a transformer-based encoder through cross-attention, and the IMB retrieves relevant prototypes via cosine similarity to refine features, with a fusion mechanism that yields high-fidelity restoration. The approach outperforms single-branch and mixture-of-experts baselines on four severe-weather benchmarks in PSNR and SSIM, while maintaining efficiency suitable for real-time deployment. These results suggest MVLR's practical value for robust outdoor perception in autonomous systems and robotics.

Abstract

Reliable visual perception under adverse weather conditions, such as rain, haze, snow, or a mixture of them, is desirable yet challenging for autonomous driving and outdoor robots. In this paper, we propose a unified Memory-Enhanced Visual-Language Recovery (MVLR) model that restores images from different degradation levels under various weather conditions. MVLR couples a lightweight encoder-decoder backbone with a Visual-Language Model (VLM) and an Implicit Memory Bank (IMB). The VLM performs chain-of-thought inference to encode weather degradation priors and the IMB stores continuous latent representations of degradation patterns. The VLM-generated priors query the IMB to retrieve fine-grained degradation prototypes. These prototypes are then adaptively fused with multi-scale visual features via dynamic cross-attention mechanisms, enhancing restoration accuracy while maintaining computational efficiency. Extensive experiments on four severe-weather benchmarks show that MVLR surpasses single-branch and Mixture-of-Experts baselines in terms of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). These results indicate that MVLR offers a practical balance between model compactness and expressiveness for real-time deployment in diverse outdoor conditions.

Paper Structure

This paper contains 19 sections, 12 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Quantitative comparison (PSNR and SSIM). We present a comparison between our model (red) and baseline methods on three representative degradation scenarios. The superscripts next to the evaluation metrics indicate the corresponding weather degradation type.
  • Figure 2: System overview of the MVLR pipeline. Given a degraded image $I^{degraded}$, the model aims to restore a clean image $I^{clean}$. A VLM with prompt words $T^{prompt}$ generates a description embedding $T^{embed}$ that captures weather type, degradation severity, and scene information. This embedding is mapped and fused with image features in the encoder. An implicit memory module then enhances the joint embedding using multi-dimensional degradation prototypes. Finally, the enhanced embeddings are passed through a transformer decoder and a convolution tail to recover the clean image.
  • Figure 3: VLM generates a structured text prior process through chain reasoning. The set prompt guides VLM to analyze the degradation type of the environment and further infer about the consistency requirements in the recovery process based on the current degradation situation.
  • Figure 4: Qualitative comparisons are performed on three representative degradation scenarios (raindrop occlusion, snow streaks, and dense fog). The first column shows the degraded images, and the subsequent columns show the restoration images of state-of-the-art baseline methods (All-in-One, AirNet, Chen et al., and TransWeather) and our method, and the ground truth, with some details magnified.