Table of Contents
Fetching ...

Exploiting Regional Information Transformer for Single Image Deraining

Baiang Li, Zhao Zhang, Huan Zheng, Xiaogang Xu, Yanyan Wei, Jingyi Zhang, Jicong Fan, Meng Wang

TL;DR

Regformer addresses single image deraining by explicitly differentiating rain-affected and unaffected regions. It introduces Region Transformer Blocks (RTB) with Region Masked Attention (RMA) and a Mixed Gate Forward Block (MGFB), organized in a Region Transformer Cascade (RTC) within an encoder-decoder framework. The region-masking strategy and mixed-scale modeling collectively yield state-of-the-art deraining performance on multiple synthetic and real-world datasets, with substantial improvements in PSNR and SSIM. The approach demonstrates strong practical impact by preserving texture and detail while effectively removing rain artifacts, and the authors provide public code and pretrained models for reproducibility.

Abstract

Transformer-based Single Image Deraining (SID) methods have achieved remarkable success, primarily attributed to their robust capability in capturing long-range interactions. However, we've noticed that current methods handle rain-affected and unaffected regions concurrently, overlooking the disparities between these areas, resulting in confusion between rain streaks and background parts, and inabilities to obtain effective interactions, ultimately resulting in suboptimal deraining outcomes. To address the above issue, we introduce the Region Transformer (Regformer), a novel SID method that underlines the importance of independently processing rain-affected and unaffected regions while considering their combined impact for high-quality image reconstruction. The crux of our method is the innovative Region Transformer Block (RTB), which integrates a Region Masked Attention (RMA) mechanism and a Mixed Gate Forward Block (MGFB). Our RTB is used for attention selection of rain-affected and unaffected regions and local modeling of mixed scales. The RMA generates attention maps tailored to these two regions and their interactions, enabling our model to capture comprehensive features essential for rain removal. To better recover high-frequency textures and capture more local details, we develop the MGFB as a compensation module to complete local mixed scale modeling. Extensive experiments demonstrate that our model reaches state-of-the-art performance, significantly improving the image deraining quality. Our code and trained models are publicly available.

Exploiting Regional Information Transformer for Single Image Deraining

TL;DR

Regformer addresses single image deraining by explicitly differentiating rain-affected and unaffected regions. It introduces Region Transformer Blocks (RTB) with Region Masked Attention (RMA) and a Mixed Gate Forward Block (MGFB), organized in a Region Transformer Cascade (RTC) within an encoder-decoder framework. The region-masking strategy and mixed-scale modeling collectively yield state-of-the-art deraining performance on multiple synthetic and real-world datasets, with substantial improvements in PSNR and SSIM. The approach demonstrates strong practical impact by preserving texture and detail while effectively removing rain artifacts, and the authors provide public code and pretrained models for reproducibility.

Abstract

Transformer-based Single Image Deraining (SID) methods have achieved remarkable success, primarily attributed to their robust capability in capturing long-range interactions. However, we've noticed that current methods handle rain-affected and unaffected regions concurrently, overlooking the disparities between these areas, resulting in confusion between rain streaks and background parts, and inabilities to obtain effective interactions, ultimately resulting in suboptimal deraining outcomes. To address the above issue, we introduce the Region Transformer (Regformer), a novel SID method that underlines the importance of independently processing rain-affected and unaffected regions while considering their combined impact for high-quality image reconstruction. The crux of our method is the innovative Region Transformer Block (RTB), which integrates a Region Masked Attention (RMA) mechanism and a Mixed Gate Forward Block (MGFB). Our RTB is used for attention selection of rain-affected and unaffected regions and local modeling of mixed scales. The RMA generates attention maps tailored to these two regions and their interactions, enabling our model to capture comprehensive features essential for rain removal. To better recover high-frequency textures and capture more local details, we develop the MGFB as a compensation module to complete local mixed scale modeling. Extensive experiments demonstrate that our model reaches state-of-the-art performance, significantly improving the image deraining quality. Our code and trained models are publicly available.
Paper Structure (16 sections, 6 equations, 24 figures, 5 tables)

This paper contains 16 sections, 6 equations, 24 figures, 5 tables.

Figures (24)

  • Figure 1: Sub-figures (a) and (b): Schematic diagrams illustrating the process of handling images in previous methods (left) and our Regformer (right). Different from previous works, our method applies three distinct masks to the attention phase, resulting in three different attentional results for SID. Our approach addresses the significant disparities between the rain-affected and unaffected regions of an image, as demonstrated in the output results of our approach. Sub-figure (c): Performance comparison between our method and other Transformer-based methods in terms of PSNR, computational complexity (GFLOPs), and model parameters. In both graphs, our method outperforms the others by achieving the highest PSNR while maintaining a reasonable balance of model complexity and computational cost. This showcases the efficiency and effectiveness of our Regformer for SID.
  • Figure 2: This is an overview of our proposed Regformer, which is comprised of Region Transformer Block (RTB) and Region Transformer Cascade (RTC) consisting of three RTBs (each implementing a different mask mechanism). In sub-figure (a), RTBs with different colors represent the use of varying mask strategies, and we use a full mask strategy in the encoder part. We calculate the mask by utilizing the shallow features derived from the input image post a $3 \times 3$ convolution and various downsampling stages, along with the features obtained from the RTC (more details can be seen in Eq. \ref{['eq2']} and Fig. \ref{['Mask']}). Sub-figure (a) depicts the overall framework of our model, sub-figure (b) demonstrates our RMA mechanism, and sub-figure (c) illustrates our MGFB mechanism.
  • Figure 3: Illustration of the Region Mask generation. Here, $I$ and $I'$ represent the shallow features and the restored features in RTC, respectively. The ForeGround Mask focuses on the rain-affected region, and the BackGround Mask highlights the unaffected area. For ease of understanding , we show images mapped to RGB space rather than feature maps.
  • Figure 5: Visual comparison on the SPA-Data SPANet dataset. To facilitate the display of the results, the local graph has been rotated 90 degrees clockwise. Compared to previous solutions, our method excels at distinguishing content-like rain streak noise and preserving the original details of the input image while effectively removing rain streaks, demonstrating its superiority over previous methods.
  • Figure 6: Visual comparison of different methods on Rain200H JORDERE dataset. Clearly, our Regformer model can perform more accurately in detail and texture recovery than other approaches.
  • ...and 19 more figures