Table of Contents
Fetching ...

A Hybrid Transformer-Mamba Network for Single Image Deraining

Shangquan Sun, Wenqi Ren, Juxiang Zhou, Jianhou Gan, Rui Wang, Xiaochun Cao

TL;DR

This work tackles single image deraining by introducing TransMamba, a dual-branch network that combines a spectral-domain Transformer with SBSA and SEFF and a Mamba-based cascaded bidirectional state-space module (CBSM). The spectral-domain attention leverages frequency bands to separate rain streaks from background, while SEFF enhances frequency-specific information; reconstruction is further guided by a spectral coherence loss that enforces signal-level linear relationships. Extensive experiments on synthetic and real-world datasets show state-of-the-art deraining performance with competitive efficiency, and improved downstream object-detection results. The approach provides a principled fusion of global spectral modeling and local sequence coherence, with practical implications for robust vision in adverse weather.

Abstract

Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions, limiting the exploitation of non-local receptive fields. In response to this issue, we introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies. Based on the prior of distinct spectral-domain features of rain degradation and background, we design a spectral-banded Transformer blocks on the first branch. Self-attention is executed within the combination of the spectral-domain channel dimension to improve the ability of modeling long-range dependencies. To enhance frequency-specific information, we present a spectral enhanced feed-forward module that aggregates features in the spectral domain. In the second branch, Mamba layers are equipped with cascaded bidirectional state space model modules to additionally capture the modeling of both local and global information. At each stage of both the encoder and decoder, we perform channel-wise concatenation of dual-branch features and achieve feature fusion through channel reduction, enabling more effective integration of the multi-scale information from the Transformer and Mamba branches. To better reconstruct innate signal-level relations within clean images, we also develop a spectral coherence loss. Extensive experiments on diverse datasets and real-world images demonstrate the superiority of our method compared against the state-of-the-art approaches.

A Hybrid Transformer-Mamba Network for Single Image Deraining

TL;DR

This work tackles single image deraining by introducing TransMamba, a dual-branch network that combines a spectral-domain Transformer with SBSA and SEFF and a Mamba-based cascaded bidirectional state-space module (CBSM). The spectral-domain attention leverages frequency bands to separate rain streaks from background, while SEFF enhances frequency-specific information; reconstruction is further guided by a spectral coherence loss that enforces signal-level linear relationships. Extensive experiments on synthetic and real-world datasets show state-of-the-art deraining performance with competitive efficiency, and improved downstream object-detection results. The approach provides a principled fusion of global spectral modeling and local sequence coherence, with practical implications for robust vision in adverse weather.

Abstract

Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions, limiting the exploitation of non-local receptive fields. In response to this issue, we introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies. Based on the prior of distinct spectral-domain features of rain degradation and background, we design a spectral-banded Transformer blocks on the first branch. Self-attention is executed within the combination of the spectral-domain channel dimension to improve the ability of modeling long-range dependencies. To enhance frequency-specific information, we present a spectral enhanced feed-forward module that aggregates features in the spectral domain. In the second branch, Mamba layers are equipped with cascaded bidirectional state space model modules to additionally capture the modeling of both local and global information. At each stage of both the encoder and decoder, we perform channel-wise concatenation of dual-branch features and achieve feature fusion through channel reduction, enabling more effective integration of the multi-scale information from the Transformer and Mamba branches. To better reconstruct innate signal-level relations within clean images, we also develop a spectral coherence loss. Extensive experiments on diverse datasets and real-world images demonstrate the superiority of our method compared against the state-of-the-art approaches.
Paper Structure (28 sections, 12 equations, 14 figures, 4 tables)

This paper contains 28 sections, 12 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Visual comparisons among the state-of-the-art Transformer deraining methods and ours on real-world rainy Internet-Data wang2019spatial. All model weights for real-world deraining are trained on SPA-Data wang2019spatial. The samples from (b) to (h) are DualGCN Fu2021RainSR, SPDNet yi2021Structure, Restormer zamir2022restormer, IDT xiao2022image, DRSformer chen2023learning, UDR-S$^2$Former chen2023sparse, and NeRD NeRD-Rain. Our method produces the most visually pleasing result on the real-world rainy image.
  • Figure 2: Demonstration that spectral bands of different frequencies separately encode background and rain streaks. The process of replacing the low-frequency band of the rainy signal with that of the clean signal, results in the easy removal of rain streaks. Inspired by the phenomenon, we propose allocating various attention across bands, taking advantage of the distinct information encoding in different frequency bands.
  • Figure 3: The architecture of our hybrid Transformer-Mamba network (TransMamba) follows a dual-branch structure containing four levels. Each level consists of $\rm N_i$ spectral-domain Transformer Blocks (SDTBs) and $\rm L_i$ spatial-domain Mamba layers. Each SDTB is composed of spectral banding self-attention (SBSA) and spectral enhanced feed-forward (SEFF). Within SBSA, we present spectral banding reorganization (SBR) to categorize high and low frequency features. Each Mamba layer contains multiple cascaded Bi-directional SSM modules (CBSMs).
  • Figure 4: Visual comparisons of deraining on Rain200L (1st row) and Rain200H (2nd row) yang2017deep. The sample from (b) to (f) are Restormer zamir2022restormer, IDT xiao2022image, DRSformer chen2023learning, UDR-S$^2$Former chen2023sparse, and NeRD-Rain NeRD-Rain, respectively. Please zoom in for a better view.
  • Figure 5: Visual comparisons of deraining on DID-Data zhang2018density (1st row) and DDN-Data fu2017removing (2nd row). The sample from (b) to (f) are Restormer zamir2022restormer, IDT xiao2022image, DRSformer chen2023learning, UDR-S$^2$Former chen2023sparse, and NeRD-Rain NeRD-Rain, respectively. Please zoom in for a better view.
  • ...and 9 more figures