RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining
Hongtao Wu, Yijun Yang, Huihui Xu, Weiming Wang, Jinni Zhou, Lei Zhu
TL;DR
RainMamba introduces an enhanced state-space modeling approach for video deraining by combining a Hilbert-based local scanning strategy with coarse-to-fine Mamba blocks to capture both global and local spatio-temporal dependencies. A difference-guided dynamic contrastive locality learning module strengthens patch-level self-similarity, enabling robust restoration of rain streaks and raindrops. The method achieves state-of-the-art performance across four synthetic and real-world video rain datasets while maintaining favorable computational efficiency due to the linear complexity of SSMs. Empirical results, ablations, and efficiency analyses demonstrate RainMamba’s strong practical impact for real-time outdoor video pre-processing. The work positions SSM-based vision models as a competitive baseline for low-level video restoration tasks.
Abstract
The outdoor vision systems are frequently contaminated by rain streaks and raindrops, which significantly degenerate the performance of visual tasks and multimedia applications. The nature of videos exhibits redundant temporal cues for rain removal with higher stability. Traditional video deraining methods heavily rely on optical flow estimation and kernel-based manners, which have a limited receptive field. Yet, transformer architectures, while enabling long-term dependencies, bring about a significant increase in computational complexity. Recently, the linear-complexity operator of the state space models (SSMs) has contrarily facilitated efficient long-term temporal modeling, which is crucial for rain streaks and raindrops removal in videos. Unexpectedly, its uni-dimensional sequential process on videos destroys the local correlations across the spatio-temporal dimension by distancing adjacent pixels. To address this, we present an improved SSMs-based video deraining network (RainMamba) with a novel Hilbert scanning mechanism to better capture sequence-level local information. We also introduce a difference-guided dynamic contrastive locality learning strategy to enhance the patch-level self-similarity learning ability of the proposed network. Extensive experiments on four synthesized video deraining datasets and real-world rainy videos demonstrate the effectiveness and efficiency of our network in the removal of rain streaks and raindrops. Our code and results are available at https://github.com/TonyHongtaoWu/RainMamba.
