Exploring Richer and More Accurate Information via Frequency Selection for Image Restoration
Hu Gao, Depeng Dang
TL;DR
This work tackles image restoration by bridging spatial and frequency-domain information to better recover degraded images. It introduces MSFSNet, a four-scale encoder-decoder that employs two plug-in modules: Dynamic Filter Selection Module (DFS) to dynamically generate low- and high-frequency maps via learnable filters and a Frequency Cross-Attention Mechanism (FCAM) to select the most informative frequencies, and Skip Feature Fusion Block (SFF) to selectively propagate useful skip-connection information. The network is trained with a joint spatial and Fourier-domain loss, L = L_s + \lambda L_f, where L_s captures pixel-wise differences and L_f enforces fidelity in the frequency domain via the Fourier transform, with \lambda controlling the balance. Across image motion deblurring, defocus deblurring, deraining, and denoising, MSFSNet delivers state-of-the-art or competitive results while achieving notable efficiency, including substantial MACs reduction in deraining and improved PSNR over strong baselines on multiple datasets. The plug-in nature of DFS and SFF enables these frequency-aware enhancements to be readily integrated into existing restoration networks to boost multi-scale feature quality and robustness.
Abstract
Image restoration aims to recover high-quality images from their corrupted counterparts. Many existing methods primarily focus on the spatial domain, neglecting the understanding of frequency variations and ignoring the impact of implicit noise in skip connections. In this paper, we introduce a multi-scale frequency selection network (MSFSNet) that seamlessly integrates spatial and frequency domain knowledge, selectively recovering richer and more accurate information. Specifically, we initially capture spatial features and input them into dynamic filter selection modules (DFS) at different scales to integrate frequency knowledge. DFS utilizes learnable filters to generate high and low-frequency information and employs a frequency cross-attention mechanism (FCAM) to determine the most information to recover. To learn a multi-scale and accurate set of hybrid features, we develop a skip feature fusion block (SFF) that leverages contextual features to discriminatively determine which information should be propagated in skip-connections. It is worth noting that our DFS and SFF are generic plug-in modules that can be directly employed in existing networks without any adjustments, leading to performance improvements. Extensive experiments across various image restoration tasks demonstrate that our MSFSNet achieves performance that is either superior or comparable to state-of-the-art algorithms.
