Efficient Image Super-Resolution via Symmetric Visual Attention Network
Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, Jing Hu
TL;DR
The paper tackles the challenge of achieving high-quality single-image super-resolution with low computational cost. It introduces the Symmetric Visual Attention Network (SVAN), which expands the receptive field through a decomposed large-kernel attention block (SLKAB) built from a $5\times5$ depthwise conv, a $5\times5$ depthwise dilated conv (d=$3$), and a $1\times1$ conv, arranged in dual, symmetric bottlenecks to emulate a $17\times17$ kernel with far fewer parameters. The network uses shallow feature extraction, seven SLKAB blocks for deep feature extraction, and a pixel-shuffle reconstruction stage, achieving competitive PSNR/SSIM with roughly 30% of the parameters of state-of-the-art methods and substantially lower FLOPs. Ablation confirms that the $17-1-1-17$ receptive-field arrangement and the symmetric bottleneck design are key to the performance gains. The work demonstrates that efficient SR with large receptive fields is achievable, enabling practical deployment on resource-constrained platforms.
Abstract
An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution can significantly improve image quality, but the computational cost is too high. To improve the reconstruction details of efficient super-resolution reconstruction, we propose a Symmetric Visual Attention Network (SVAN) by applying large receptive fields. The SVAN decomposes a large kernel convolution into three different combinations of convolution operations and combines them with an attention mechanism to form a Symmetric Large Kernel Attention Block (SLKAB), which forms a symmetric attention block with a bottleneck structure by the size of the receptive field in the convolution combination to extract depth features effectively as the basic component of the SVAN. Our network gets a large receptive field while minimizing the number of parameters and improving the perceptual ability of the model. The experimental results show that the proposed SVAN can obtain high-quality super-resolution reconstruction results using only about 30% of the parameters of existing SOTA methods.
