Table of Contents
Fetching ...

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, Jing Hu

TL;DR

The paper tackles the challenge of achieving high-quality single-image super-resolution with low computational cost. It introduces the Symmetric Visual Attention Network (SVAN), which expands the receptive field through a decomposed large-kernel attention block (SLKAB) built from a $5\times5$ depthwise conv, a $5\times5$ depthwise dilated conv (d=$3$), and a $1\times1$ conv, arranged in dual, symmetric bottlenecks to emulate a $17\times17$ kernel with far fewer parameters. The network uses shallow feature extraction, seven SLKAB blocks for deep feature extraction, and a pixel-shuffle reconstruction stage, achieving competitive PSNR/SSIM with roughly 30% of the parameters of state-of-the-art methods and substantially lower FLOPs. Ablation confirms that the $17-1-1-17$ receptive-field arrangement and the symmetric bottleneck design are key to the performance gains. The work demonstrates that efficient SR with large receptive fields is achievable, enabling practical deployment on resource-constrained platforms.

Abstract

An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution can significantly improve image quality, but the computational cost is too high. To improve the reconstruction details of efficient super-resolution reconstruction, we propose a Symmetric Visual Attention Network (SVAN) by applying large receptive fields. The SVAN decomposes a large kernel convolution into three different combinations of convolution operations and combines them with an attention mechanism to form a Symmetric Large Kernel Attention Block (SLKAB), which forms a symmetric attention block with a bottleneck structure by the size of the receptive field in the convolution combination to extract depth features effectively as the basic component of the SVAN. Our network gets a large receptive field while minimizing the number of parameters and improving the perceptual ability of the model. The experimental results show that the proposed SVAN can obtain high-quality super-resolution reconstruction results using only about 30% of the parameters of existing SOTA methods.

Efficient Image Super-Resolution via Symmetric Visual Attention Network

TL;DR

The paper tackles the challenge of achieving high-quality single-image super-resolution with low computational cost. It introduces the Symmetric Visual Attention Network (SVAN), which expands the receptive field through a decomposed large-kernel attention block (SLKAB) built from a depthwise conv, a depthwise dilated conv (d=), and a conv, arranged in dual, symmetric bottlenecks to emulate a kernel with far fewer parameters. The network uses shallow feature extraction, seven SLKAB blocks for deep feature extraction, and a pixel-shuffle reconstruction stage, achieving competitive PSNR/SSIM with roughly 30% of the parameters of state-of-the-art methods and substantially lower FLOPs. Ablation confirms that the receptive-field arrangement and the symmetric bottleneck design are key to the performance gains. The work demonstrates that efficient SR with large receptive fields is achievable, enabling practical deployment on resource-constrained platforms.

Abstract

An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution can significantly improve image quality, but the computational cost is too high. To improve the reconstruction details of efficient super-resolution reconstruction, we propose a Symmetric Visual Attention Network (SVAN) by applying large receptive fields. The SVAN decomposes a large kernel convolution into three different combinations of convolution operations and combines them with an attention mechanism to form a Symmetric Large Kernel Attention Block (SLKAB), which forms a symmetric attention block with a bottleneck structure by the size of the receptive field in the convolution combination to extract depth features effectively as the basic component of the SVAN. Our network gets a large receptive field while minimizing the number of parameters and improving the perceptual ability of the model. The experimental results show that the proposed SVAN can obtain high-quality super-resolution reconstruction results using only about 30% of the parameters of existing SOTA methods.
Paper Structure (12 sections, 9 equations, 4 figures, 3 tables)

This paper contains 12 sections, 9 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Large kernel convolution with kernel 13 can be decomposed into a 5$\times$5 depth-wise convolution and a 5$\times$5 depth-wise dilation convolution with a dilation of 3. The figure shows the convolution combination used in our model: a 5$\times$5 depth-wise convolution and a 5$\times$5 depth-wise dilation convolution with a dilation of 3, and a 1$\times$1 point convolution. The blue color shows the kernel. Note: there are zero paddings in the figure.
  • Figure 2: The architecture of Symmetric Visual Attention Network. SVAN contains three main parts: shallow feature extraction module, deep feature extraction module, and pixel shuffle reconstruction module.
  • Figure 3: The architecture of Symmetric Large Kernel Attention Block. We perform a receptive field size bottleneck structure and symmetrical design for attention in SLKAB.
  • Figure 4: Visual results on benchmark datasets for $\times$4 upscaling. All image comparison results are generated by the code and models provided in the corresponding papers hui2019lightweightliu2020residualkong2022residual.