Table of Contents
Fetching ...

Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining

Zhaocheng Yu, Kui Jiang, Junjun Jiang, Xianming Liu, Guanglu Sun, Yi Xiao

TL;DR

The paper tackles rain-induced degradation in vision systems by fusing macro semantic priors from CLIP with micro visual priors from DINOv2 in a unified MPHM framework. A Priors Fusion Injection scheme progressively injects these multimodal priors into a Fourier-enhanced Hierarchical Mamba backbone, enabling robust semantic guidance and high-frequency texture recovery. The Hierarchical Mamba Module combines a spatial-domain branch and a Fourier-based frequency branch for multi-scale global-local representation, while the multi-modal priors guidance mitigates cross-modality conflicts. Across synthetic and real-world datasets, MPHM achieves state-of-the-art PSNR/SSIM gains (notably about 0.57 dB on Rain200H from the abstract) and demonstrates strong generalization and downstream task benefits, validating its practical impact for adverse-weather vision systems.

Abstract

Rain significantly degrades the performance of computer vision systems, particularly in applications like autonomous driving and video surveillance. While existing deraining methods have made considerable progress, they often struggle with fidelity of semantic and spatial details. To address these limitations, we propose the Multi-Prior Hierarchical Mamba (MPHM) network for image deraining. This novel architecture synergistically integrates macro-semantic textual priors (CLIP) for task-level semantic guidance and micro-structural visual priors (DINOv2) for scene-aware structural information. To alleviate potential conflicts between heterogeneous priors, we devise a progressive Priors Fusion Injection (PFI) that strategically injects complementary cues at different decoder levels. Meanwhile, we equip the backbone network with an elaborate Hierarchical Mamba Module (HMM) to facilitate robust feature representation, featuring a Fourier-enhanced dual-path design that concurrently addresses global context modeling and local detail recovery. Comprehensive experiments demonstrate MPHM's state-of-the-art performance, achieving a 0.57 dB PSNR gain on the Rain200H dataset while delivering superior generalization on real-world rainy scenarios.

Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining

TL;DR

The paper tackles rain-induced degradation in vision systems by fusing macro semantic priors from CLIP with micro visual priors from DINOv2 in a unified MPHM framework. A Priors Fusion Injection scheme progressively injects these multimodal priors into a Fourier-enhanced Hierarchical Mamba backbone, enabling robust semantic guidance and high-frequency texture recovery. The Hierarchical Mamba Module combines a spatial-domain branch and a Fourier-based frequency branch for multi-scale global-local representation, while the multi-modal priors guidance mitigates cross-modality conflicts. Across synthetic and real-world datasets, MPHM achieves state-of-the-art PSNR/SSIM gains (notably about 0.57 dB on Rain200H from the abstract) and demonstrates strong generalization and downstream task benefits, validating its practical impact for adverse-weather vision systems.

Abstract

Rain significantly degrades the performance of computer vision systems, particularly in applications like autonomous driving and video surveillance. While existing deraining methods have made considerable progress, they often struggle with fidelity of semantic and spatial details. To address these limitations, we propose the Multi-Prior Hierarchical Mamba (MPHM) network for image deraining. This novel architecture synergistically integrates macro-semantic textual priors (CLIP) for task-level semantic guidance and micro-structural visual priors (DINOv2) for scene-aware structural information. To alleviate potential conflicts between heterogeneous priors, we devise a progressive Priors Fusion Injection (PFI) that strategically injects complementary cues at different decoder levels. Meanwhile, we equip the backbone network with an elaborate Hierarchical Mamba Module (HMM) to facilitate robust feature representation, featuring a Fourier-enhanced dual-path design that concurrently addresses global context modeling and local detail recovery. Comprehensive experiments demonstrate MPHM's state-of-the-art performance, achieving a 0.57 dB PSNR gain on the Rain200H dataset while delivering superior generalization on real-world rainy scenarios.

Paper Structure

This paper contains 29 sections, 3 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: For analytical purposes, this visualization displays the PCA (Principal Component Analysis) projection of high-dimensional features from the visual encoders of various pre-trained models into 3-dimensional space.
  • Figure 2: Overview of the proposed Multi-Prior Hierarchical Mamba deraining framework. (a) DINOv2 Adapter, (b) CLIP Adapter and (c) Priors Fusion Injection (PFI).
  • Figure 3: Pipeline of the HMM.
  • Figure 4: Visual results on Rain200H. Our method recovers clearer details and textures. Please zoom in for a better view.
  • Figure 5: Residual heatmaps between derained results and ground truth. Brighter colors denote larger pixel-wise deviations. Our method shows minimal differences.
  • ...and 5 more figures