Table of Contents
Fetching ...

Image Deraining with Frequency-Enhanced State Space Model

Shugo Yamashita, Masaaki Ikehara

TL;DR

To effectively remove rain streaks, which produce high-intensity frequency components in specific directions, this work employs frequency domain processing concurrently with SSM and develops a novel mixed-scale gated-convolutional block, which uses convolutions with multiple kernel sizes to capture various scale degradations effectively.

Abstract

Removing rain degradations in images is recognized as a significant issue. In this field, deep learning-based approaches, such as Convolutional Neural Networks (CNNs) and Transformers, have succeeded. Recently, State Space Models (SSMs) have exhibited superior performance across various tasks in both natural language processing and image processing due to their ability to model long-range dependencies. This study introduces SSM to image deraining with deraining-specific enhancements and proposes a Deraining Frequency-Enhanced State Space Model (DFSSM). To effectively remove rain streaks, which produce high-intensity frequency components in specific directions, we employ frequency domain processing concurrently with SSM. Additionally, we develop a novel mixed-scale gated-convolutional block, which uses convolutions with multiple kernel sizes to capture various scale degradations effectively and integrates a gating mechanism to manage the flow of information. Finally, experiments on synthetic and real-world rainy image datasets show that our method surpasses state-of-the-art methods. Code is available at https://github.com/ShugoYamashita/DFSSM.

Image Deraining with Frequency-Enhanced State Space Model

TL;DR

To effectively remove rain streaks, which produce high-intensity frequency components in specific directions, this work employs frequency domain processing concurrently with SSM and develops a novel mixed-scale gated-convolutional block, which uses convolutions with multiple kernel sizes to capture various scale degradations effectively.

Abstract

Removing rain degradations in images is recognized as a significant issue. In this field, deep learning-based approaches, such as Convolutional Neural Networks (CNNs) and Transformers, have succeeded. Recently, State Space Models (SSMs) have exhibited superior performance across various tasks in both natural language processing and image processing due to their ability to model long-range dependencies. This study introduces SSM to image deraining with deraining-specific enhancements and proposes a Deraining Frequency-Enhanced State Space Model (DFSSM). To effectively remove rain streaks, which produce high-intensity frequency components in specific directions, we employ frequency domain processing concurrently with SSM. Additionally, we develop a novel mixed-scale gated-convolutional block, which uses convolutions with multiple kernel sizes to capture various scale degradations effectively and integrates a gating mechanism to manage the flow of information. Finally, experiments on synthetic and real-world rainy image datasets show that our method surpasses state-of-the-art methods. Code is available at https://github.com/ShugoYamashita/DFSSM.
Paper Structure (27 sections, 12 equations, 5 figures, 7 tables)

This paper contains 27 sections, 12 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Top row: a rainy image, a clear image, and their differential image. Bottom row: Corresponding 2D Fourier amplitude spectra. The amplitude of the Fourier transform is denoted as $|\mathcal{F}(\cdot)|$.
  • Figure 2: The architecture of our Deraining Frequency-Enhanced State Space Model (DFSSM). The overall U-Net architecture (a) has $8$ stages. Each stage (b) consists of $N_{S}$ State Space Groups (SSGs) and $N_{F}$ Frequency-Enhanced State Space Groups (FSSGs). (c), (d), (e), (f), and (g) illustrate the details of the components. The SSG includes a State Space Block (SSB) and a Mixed-Scale Gated-Convolutional Block (MGCB), while the FSSG includes a Frequency-Enhanced State Space Block (FSSB) and an MGCB. Both SSB and FSSB employ a Vision State Space Module (VSSM), and the FSSB also uses a Fast Fourier Transform Module (FFTM).
  • Figure 3: Visual comparison on the synthetic rainy image of Rain200H Rain200HL. Red and blue boxes correspond to the zoomed-in patches.
  • Figure 4: Visual comparison on the real-world rainy image of LHP-Rain LHP_Rain. Red and blue boxes correspond to the zoomed-in patches.
  • Figure 5: Comparison of computational complexity among State Space Model (SSM) mamba, standard Self-Attention (SA) vaswani2017attention, and multi-Dconv Head Transposed Attention (MDTA) restormer. FLOPs are measured for input image sizes ranging from $32\times32$ to $256\times256$. SA can not be measured beyond $192\times192$ due to being out of memory.