Dual-Path Coupled Image Deraining Network via Spatial-Frequency Interaction
Yuhong He, Aiwen Jiang, Lingfang Jiang, Zhifeng Wang, Lu Wang
TL;DR
The paper tackles single-image deraining by integrating spatial and frequency-domain cues through a dual-path architecture named DPCNet. It introduces Spatial Feature Extraction Block (SFEBlock) with a Spatial-Channel Transformer Block (SCTB) to fuse spatial and channel information, Frequency Feature Extraction Block (FFEBlock) based on channel-wise FFT to capture high-frequency textures, and an Adaptive Fusion Module (AFM) to fuse dual-domain features. The approach is implemented in a multi-scale encoder-decoder with Dual-Domain Blocks (DDBlock) and demonstrates state-of-the-art results on six synthetic/real benchmarks, along with improved robustness on downstream object detection tasks. The loss combines $L_1$, perceptual, and FFT terms, guiding both pixel-level restoration and frequency-domain fidelity. Overall, the work shows the value of jointly leveraging spatial and frequency information for effective deraining and practical vision applications.
Abstract
Transformers have recently emerged as a significant force in the field of image deraining. Existing image deraining methods utilize extensive research on self-attention. Though showcasing impressive results, they tend to neglect critical frequency information, as self-attention is generally less adept at capturing high-frequency details. To overcome this shortcoming, we have developed an innovative Dual-Path Coupled Deraining Network (DPCNet) that integrates information from both spatial and frequency domains through Spatial Feature Extraction Block (SFEBlock) and Frequency Feature Extraction Block (FFEBlock). We have further introduced an effective Adaptive Fusion Module (AFM) for the dual-path feature aggregation. Extensive experiments on six public deraining benchmarks and downstream vision tasks have demonstrated that our proposed method not only outperforms the existing state-of-the-art deraining method but also achieves visually pleasuring results with excellent robustness on downstream vision tasks.
