Semantics and Content Matter: Towards Multi-Prior Hierarchical Mamba for Image Deraining
Zhaocheng Yu, Kui Jiang, Junjun Jiang, Xianming Liu, Guanglu Sun, Yi Xiao
TL;DR
The paper tackles rain-induced degradation in vision systems by fusing macro semantic priors from CLIP with micro visual priors from DINOv2 in a unified MPHM framework. A Priors Fusion Injection scheme progressively injects these multimodal priors into a Fourier-enhanced Hierarchical Mamba backbone, enabling robust semantic guidance and high-frequency texture recovery. The Hierarchical Mamba Module combines a spatial-domain branch and a Fourier-based frequency branch for multi-scale global-local representation, while the multi-modal priors guidance mitigates cross-modality conflicts. Across synthetic and real-world datasets, MPHM achieves state-of-the-art PSNR/SSIM gains (notably about 0.57 dB on Rain200H from the abstract) and demonstrates strong generalization and downstream task benefits, validating its practical impact for adverse-weather vision systems.
Abstract
Rain significantly degrades the performance of computer vision systems, particularly in applications like autonomous driving and video surveillance. While existing deraining methods have made considerable progress, they often struggle with fidelity of semantic and spatial details. To address these limitations, we propose the Multi-Prior Hierarchical Mamba (MPHM) network for image deraining. This novel architecture synergistically integrates macro-semantic textual priors (CLIP) for task-level semantic guidance and micro-structural visual priors (DINOv2) for scene-aware structural information. To alleviate potential conflicts between heterogeneous priors, we devise a progressive Priors Fusion Injection (PFI) that strategically injects complementary cues at different decoder levels. Meanwhile, we equip the backbone network with an elaborate Hierarchical Mamba Module (HMM) to facilitate robust feature representation, featuring a Fourier-enhanced dual-path design that concurrently addresses global context modeling and local detail recovery. Comprehensive experiments demonstrate MPHM's state-of-the-art performance, achieving a 0.57 dB PSNR gain on the Rain200H dataset while delivering superior generalization on real-world rainy scenarios.
