Table of Contents
Fetching ...

Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal

Xiangyu Li, Wanshu Fan, Yue Shen, Cong Wang, Wei Wang, Xin Yang, Qiang Zhang, Dongsheng Zhou

TL;DR

This work tackles the challenge of single-image rain streak removal in vision-based measurement systems by introducing EMResformer, a Transformer-based deraining framework that integrates an Expectation Maximization Block to iteratively optimize attention and a Local Model Residual Block to enhance local detail. The EMB reduces feature redundancy and concentrates attention, while the LMRB enhances local information preservation, collectively delivering cleaner background reconstruction. Extensive experiments on synthetic and real-world rain datasets show superior deraining quality (PSNR/SSIM) and improved VBMS downstream tasks (segmentation and object detection), with favorable trade-offs between model complexity and performance. The results suggest EMResformer as a robust preprocessing tool for VBMS, enabling more reliable visual measurements under adverse weather conditions.

Abstract

High-fidelity imaging is crucial for the successful safety supervision and intelligent deployment of vision-based measurement systems (VBMS). It ensures high-quality imaging in VBMS, which is fundamental for reliable visual measurement and analysis. However, imaging quality can be significantly impaired by adverse weather conditions, particularly rain, leading to blurred images and reduced contrast. Such impairments increase the risk of inaccurate evaluations and misinterpretations in VBMS. To address these limitations, we propose an Expectation Maximization Reconstruction Transformer (EMResformer) for single image rain streak removal. The EMResformer retains the key self-attention values for feature aggregation, enhancing local features to produce superior image reconstruction. Specifically, we propose an Expectation Maximization Block seamlessly integrated into the single image rain streak removal network, enhancing its ability to eliminate superfluous information and restore a cleaner background image. Additionally, to further enhance local information for improved detail rendition, we introduce a Local Model Residual Block, which integrates two local model blocks along with a sequence of convolutions and activation functions. This integration synergistically facilitates the extraction of more pertinent features for enhanced single image rain streak removal. Extensive experiments validate that our proposed EMResformer surpasses current state-of-the-art single image rain streak removal methods on both synthetic and real-world datasets, achieving an improved balance between model complexity and single image deraining performance. Furthermore, we evaluate the effectiveness of our method in VBMS scenarios, demonstrating that high-quality imaging significantly improves the accuracy and reliability of VBMS tasks.

Iterative Optimal Attention and Local Model for Single Image Rain Streak Removal

TL;DR

This work tackles the challenge of single-image rain streak removal in vision-based measurement systems by introducing EMResformer, a Transformer-based deraining framework that integrates an Expectation Maximization Block to iteratively optimize attention and a Local Model Residual Block to enhance local detail. The EMB reduces feature redundancy and concentrates attention, while the LMRB enhances local information preservation, collectively delivering cleaner background reconstruction. Extensive experiments on synthetic and real-world rain datasets show superior deraining quality (PSNR/SSIM) and improved VBMS downstream tasks (segmentation and object detection), with favorable trade-offs between model complexity and performance. The results suggest EMResformer as a robust preprocessing tool for VBMS, enabling more reliable visual measurements under adverse weather conditions.

Abstract

High-fidelity imaging is crucial for the successful safety supervision and intelligent deployment of vision-based measurement systems (VBMS). It ensures high-quality imaging in VBMS, which is fundamental for reliable visual measurement and analysis. However, imaging quality can be significantly impaired by adverse weather conditions, particularly rain, leading to blurred images and reduced contrast. Such impairments increase the risk of inaccurate evaluations and misinterpretations in VBMS. To address these limitations, we propose an Expectation Maximization Reconstruction Transformer (EMResformer) for single image rain streak removal. The EMResformer retains the key self-attention values for feature aggregation, enhancing local features to produce superior image reconstruction. Specifically, we propose an Expectation Maximization Block seamlessly integrated into the single image rain streak removal network, enhancing its ability to eliminate superfluous information and restore a cleaner background image. Additionally, to further enhance local information for improved detail rendition, we introduce a Local Model Residual Block, which integrates two local model blocks along with a sequence of convolutions and activation functions. This integration synergistically facilitates the extraction of more pertinent features for enhanced single image rain streak removal. Extensive experiments validate that our proposed EMResformer surpasses current state-of-the-art single image rain streak removal methods on both synthetic and real-world datasets, achieving an improved balance between model complexity and single image deraining performance. Furthermore, we evaluate the effectiveness of our method in VBMS scenarios, demonstrating that high-quality imaging significantly improves the accuracy and reliability of VBMS tasks.

Paper Structure

This paper contains 27 sections, 24 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Overview of our EMResformer for degraded scene restoration in vision-based measurement systems (VBMS).
  • Figure 2: Illustration of the rain imaging degraded model in VBMS.
  • Figure 3: The overall architecture of EMResformer. Our EMResformer consists of three major components: Shallow Feature Extraction Module, Deep Feature Extraction Module, and Image Reconstruction Module. Given a rainy image input, a 3$\times$3 convolution layer is employed to extract the shallow features. Then, we use a series of Expectation Maximization Blocks as the Deep Feature Extraction Module to extract deeper features. Additionally, we use the Local Model Residual Block to enhance the network's capability to model local features. Finally, in the image reconstruction stage, we use a 3$\times$3 convolution and add the original input to generate a clean background image.
  • Figure 4: The architecture of the Expectation Maximization Block (EMB). EMB is used to effectively extract features from images, facilitating the clear background images.
  • Figure 5: The structure of the Local Model Block (LMB). LMB comprises a sequence of pooling, activation, and convolution operations to improve the network's local model capability for recovering clean background images.
  • ...and 10 more figures