DeRainMamba: A Frequency-Aware State Space Model with Detail Enhancement for Image Deraining
Zhiliang Zhu, Tao Zeng, Tao Yang, Guoliang Luo, Jiyong Zeng
TL;DR
DeRainMamba tackles image deraining by addressing the need for both global context and fine local details. It introduces a Frequency-Aware State Space Model (FASSM) that fuses a Vision State Space Module (VSSM) with a Residual Fourier Module (RFM) to jointly model spatial context and frequency-domain cues, using the fusion expression $F_{out} = \text{VSSM}(\text{LN}(F_{in})) + \text{RFM}(F_{in}) + s \cdot F_{in}$. It also introduces Multi-direction Perception Convolution (MDPConv) to capture gradient-based details with a single efficient kernel, described by $F_{out} = \text{MDPConv}(F_{in}) = F_{in} * K_{eq}$ (with $K_{eq} = \sum_{i=1}^5 K_i$). The training optimizes $\mathcal{L}_{total}= \lambda_{1} \mathcal{L}_{1} + \lambda_{2} \mathcal{L}_{Freq}$ with $\lambda_{1}=1$ and $\lambda_{2}=0.1$, and experiments on four benchmarks show improvements with a lightweight model (~27.8M parameters), validating the efficacy of combining frequency-aware global modeling with local gradient-based detail enhancement for deraining.
Abstract
Image deraining is crucial for improving visual quality and supporting reliable downstream vision tasks. Although Mamba-based models provide efficient sequence modeling, their limited ability to capture fine-grained details and lack of frequency-domain awareness restrict further improvements. To address these issues, we propose DeRainMamba, which integrates a Frequency-Aware State-Space Module (FASSM) and Multi-Directional Perception Convolution (MDPConv). FASSM leverages Fourier transform to distinguish rain streaks from high-frequency image details, balancing rain removal and detail preservation. MDPConv further restores local structures by capturing anisotropic gradient features and efficiently fusing multiple convolution branches. Extensive experiments on four public benchmarks demonstrate that DeRainMamba consistently outperforms state-of-the-art methods in PSNR and SSIM, while requiring fewer parameters and lower computational costs. These results validate the effectiveness of combining frequency-domain modeling and spatial detail enhancement within a state-space framework for single image deraining.
